Yali.ai
Role: Machine Learning Engineer
Period: June, 2022 β Present
- Lead Developer and Maintainer, Proprietary Retrieval-Augmented Generative Engine:
- Spearheaded the transition of the core system from an extractive BERT-style model to a finetuned LLaMa-based Large Language Model (LLM), enhancing the ability to generate abstractive, high-quality answers from client data.
- Achieved a 40% improvement in the retrieval pipeline's performance, as measured by Mean Reciprocal Rank (MRR) and normalized Discounted Cumulative Gain (nDCG), by implementing custom-designed relevancy check systems for superior answer generation quality.
- Engineered a super-low latency pipeline capable of delivering solutions in under 100 milliseconds, optimizing response times for client queries.
- Drove continuous advancements in the system by integrating cutting-edge research findings, thereby consistently enhancing both retrieval efficiency and answer generation capabilities.
- Lead Developer and Maintainer, Proprietary Large Language Model (LLM) API Deployment System:
- Pioneered the development of an advanced LLM API deployment system, capable of launching any Large Language Model as an API in under 10 seconds. This system supports a range of optimisation engines including GPTQ, CTranslate2, vLLM, and AWQ.
- Engineered a dynamic system optimization framework, enhancing memory usage, throughput, and latency. Implemented cutting-edge techniques such as Batched Processing and In-flight Batching, and developed intelligent routing algorithms to distribute requests between latency-optimized and throughput-optimized engines based on query type.
- Authored custom code to leverage LLMs for Classification models with an integrated Confidence Scoring mechanism, achieving high-quality classification results at a speed 2x faster than standard generative response.
- Lead Developer and Maintainer, Proprietary Speech Transcript Enrichment API:
- Spearheaded the development of a cutting-edge speech transcript enrichment API, integrating advanced conversational summary models, sentiment analysis, and bespoke hate speech classification models, all engineered from scratch for optimal performance.
- Designed and implemented highly efficient, distilled, low-latency models for conversational sentiment analysis, leveraging the power of 70B-parameter Large Language Models (LLMs). These models are optimized to provide high-accuracy sentiment assessments with ultra-fast response times (<50ms).
- Employed innovative prompt-tuning techniques on proprietary LLMs to significantly enhance their capacity for generating high-quality, contextually accurate conversational summaries.
- Integrated the API with real-time proprietary speech processing systems, ensuring seamless handling of live data streams and enhancing the system's applicability in dynamic environments.
- In the dynamic startup environment, independently conceptualized and developed the APIs based on direct input from the CEO. This required a high degree of agility, self-direction, and the ability to rapidly iterate and adapt the solution to evolving business needs, demonstrating exceptional initiative and problem-solving skills in a fast-paced setting.