LLM:
- A language model (LM) is a probabilistic model of text.
Encoders:
Models that convert a sequence of words to an embedding.
Decoders:
Models take a sequence of words and output next word.
Examples: GPT-4, Llama and bloom
Encoder - decoder module:
We passed english letters and encoder covert into token. Decode passed one tocken at time.
Hallucination:
It is generated text that is non factual and Or ungrounded.
LLM application:
Retrieval Augmented Generation (RAG)
- Primarily used in QA where the model has access to support documents for a query.
Code models:
- Instead of training on written language train on code and comments.
In-context learning and few shot prompting:
- In context learning - conditions an LLM with instructions and Or demonstrate of the task.
- K-shot prompting: Explicitly providing k examples of the intended task in the prompt.
- F-string Or formatted string are a feature in python can be used to create prompt templates for LLM.
Language Agents:
* A Budding area of research where LLM based agents
Some notable work in the space:
* ReAct
Iterative framework where LLM emits thoughts, then act and observes result
* Toolformer
Pre-training technique where strings are replaced with calls to tools that yield result.
OCI Generative AI service:
* Fully managed service that provides a set of customizable Large Language Models (LLM) avilable via a single API to build generative AI applications.
Generation:
Command -> Command light -> llama 2.7
Dedicated AI cluster:
* Dedciated AI cluster has a GPU based resource that host the customers fine-tuning and inference workloads.
OCI setup:
configuration file : ./oci/config
Model parameters:
Temperature : Determines how creative module should be, default temperature is 1 and maximum temperature is 5.
Length: Approximate length of the summary, choose from short, medium and length.
Embeddings:
Embedding is a numerical represent of piece of text converted into number sequences.
A piece of text could be a word, phrase, sentence or paragraph or more paragraphs.
Models creates a 1024 vector for each embedding.
Max 512 tokens per embedding.
Model create a 384 dimensional vector for each embedding.
It will be very difficult to tune a 2 billion tokens. We are using the In-context Learning/Few shot Prompting:
GPU memory is limited, so switching between models can incur significant overhead due to reloading the full GPU memory.
Dedicated AI cluster units:
* Large cohere - Dedicated AI cluster units for hosting or fine tuning the cohere command
* Small Cohere - Dedicated AI cluster units for hosting or fine tuning for small cohere command
* Embed Cohere - Dediated AI cluster for hosting the models
* Llama2-70 model - Dedicated AI cluster for hosting the Llamba2 models
Fine tunnning is required 2 units and each cluster is active for five hours.
RAG framework:
Retriever : It is act like search engine.
Ranker : Evalate and priorites rank based a quality of the data.
Generator : It provide human like texts.
RAG techniques:
RAG sequence
RAG token
RAG pipelines:
Documents -> Chunks -> Embedding -> Index [database]
Vector database:
A vector is sequence of numbers called dimensions, used to capture the important "features" of the data.
Semantic search:
It means search by meaning rather than giving a number.
No comments:
Post a Comment