I am sure that most of you reading this blog are aware of Large Language Models (LLMs) like ChatGPT, Claude, and DeepSeek. Recently, I have started to learn more about the inner workings of these models, which I would like to share in this blog. Before I start this post on a broad overview of LLMs, I want to call out two fascinating resources that have tremendously helped me understand these concepts: 3Blue1Brown’s YouTube series on Neural Networks and Andrej Karpathy’s deep dive on LLMs.
1. The Foundation: Neural Networks and Transformers
At the heart of every LLM is a deep learning model called a neural network, which mimics the way human brains process information. Modern LLMs are based on the Transformer architecture, introduced in the 2017 paper Attention Is All You Need from Google. Transformers allow models to efficiently process and understand vast amounts of text data by focusing on key relationships between words through a mechanism called self-attention.
2. Training on Massive Datasets
LLMs are trained on enormous amounts of text data from books, articles, websites, and other sources. The model learns to predict the next word in a sentence by adjusting billions of parameters—mathematical values that determine how the AI interprets language patterns. The training process involves:
Gathering Data - The data is the essence of the LLM. It is incredibly important to source high-quality data to ensure the model does not learn to have biases or produce misinformation.
Tokenization – This involves breaking the text into smaller units (tokens) that the model can process. Each token can be a single word or a part of a word.
Training – The model is trained on the text previously gathered. LLMs have hundreds of billions of parameters, all of which are minimally changed through every token fed into the model.
Fine-tuning – The model is refined on specific datasets for improved performance in certain tasks.
Reinforcement Learning from Human Feedback (RLHF) – A method where humans provide feedback to improve the model’s responses, making them more helpful and aligned with user expectations.
3. Inference: How LLMs Generate Responses
Once trained, an LLM generates text through inference. When given a prompt, the model analyzes the context and predicts the most likely next words based on its training. Techniques like temperature control (adjusting randomness in responses) and top-k/top-p sampling (filtering word choices) help refine outputs.
4. Challenges and Limitations
Despite their power, LLMs have certain limitations:
Bias and Misinformation – They can inherit biases from training data or generate incorrect information.
Computational Costs – Training and running LLMs require enormous computational power.
Lack of True Understanding – LLMs predict words statistically but do not truly “understand” meaning like humans do. This can sometimes lead to “hallucinations” or times when the model will generate a completely random response.
5. The Future of LLMs
Advancements like smaller, efficient models, multimodal AI (text + images + speech), and better safety mechanisms will shape the future of LLMs. As research progresses, we can expect more accurate, ethical, and accessible AI systems.
Key trends to watch:
Smaller, specialized models – Instead of one giant LLM, companies are developing domain-specific models.
Multimodal capabilities – AI models that process text, images, and video seamlessly.
Reasoning – The model will be able to think before generating an output. This will help it come up with a better response.
Search - Most LLMs already have a search feature that allows them to access the most up-to-date information, which is helpful due to LLMs having a knowledge cutoff (GPT 4o knowledge cutoff is October 2023).
I hope this post was interesting and informative! I will write more detailed posts about each step of the process in the coming weeks.
I like how the post starts with a picture of H100. Excited to read the upcoming detailed blogs!