Friday, March 28, 2025

Large Language Model [LLM] - Introduction

 


LLM stands for Large Language Model. It is specifically a deep learning model, trained on massive amounts of text data to understand and generate human language, enabling tasks like text generation, translation. It often sing "Transformer" models which are neural networks that can process relationships within language.


Reasoning LLMs


Traditional LLM workflow



Traditional LLM model is refine a dataset into pretraining workflow. The pretraining send a data into fine tuning model and give a precise collected output data. It will send it to human feed back and correct incase of any mismatch with fine tuning model.

Traditional LLMs
  • Direct pattern based prediction
  • Quick but less reliable on complex tasks
  • No explicit reasoning steps

Reasoning LLM:
  • Language models are designed complex and multiple set problems
  • Break down tasks into logical sub tasks.
  • Generate intermediate reasoning steps "thought processes"
Key Capabilities of Reasoning LLMs:
1) Chain-of-Thought Reasoning
        Internal dialogue approach
        step-by-step problem solving
2) Self consistency
        Verified own answers
        Revisits problematic solutions
3) Structured Outputs
        Organized reasoning steps

Practical Applications of Reasoning LLMs
Data Analysis
Medical diagnostics
Complex data interpretation
Anomaly detection
Background Processing
Batch processing workflows
Overnight analysis jobs
Evaluation Tasks
LLM as judge
Quality assessment
Verification workflows
Limitation of Reasoning LLM
Performance Trade-offs
* Increased latency : extended thinking process leads to significantly longer response times
* Higher resource requirements: ofent require more computational resoures
* cost-implications: More tokens and processing time translate to higher operational costs
DeepSeek:

    DeepSeek applied supervised fine-tuning to refine the models' capabilities. This involved training on datasets containing reasoning and non-reasoning tasks. Notably, reasoning data was generated by specialized "expert models" trained for specific domains such as mathematics, programming, and logic. These expert models were developed through supervised fine-tuning on both original responses and synthetic data generated by internal models like DeepSeek-R1-Lite. The use of expert models allowed DeepSeek to generate high-quality synthetic reasoning data to enhance the primary model's performance.






No comments:

Post a Comment

GCP - VPC - part 2