Read

Watch

Reflect

Prime your brain first — retention follows

1h 57m → 15 terms · 15 segments

Read ~15m

15 terms · 15 segments

The spelled-out intro to language modeling: building makemore

Andrej Karpathy AI-ML

Andrej Karpathy AI-ML|Published Sep 7, 2022Analyzed Jan 21, 2026

15chapters with key takeaways — read first, then watch

15chapters with key takeaways — read first, then watch

1

Introduction to makemore and Language Modeling

0:00-3:02•3m 2sIntro

2

Loading and Analyzing the Names Dataset

3:03-5:44•2m 41sConcept

3

Building a Bi-gram Language Model by Counting

5:45-12:44•6m 59sConcept

4

PyTorch Tensor for Bi-gram Counts and Visualization

12:45-24:09•11m 24sArchitecture

5

Generating Names by Sampling from Bi-gram Probabilities

24:10-36:15•12m 5sDemo

6

Optimizing Probability Matrix with PyTorch Broadcasting

36:16-50:15•13m 59sConcept

7

Measuring Model Quality: Negative Log Likelihood Loss

50:15-1:00:07•9m 52sConcept

8

Addressing Zero Probabilities with Model Smoothing

1:00:07-1:03:40•3m 33sConcept

9

Neural Network Approach for Bi-gram Language Model

1:03:40-1:10:01•6m 21sArchitecture

10

One-Hot Encoding and Single Linear Layer in PyTorch

1:10:01-1:18:46•8m 45sArchitecture

11

Transforming Neural Network Outputs to Probabilities (Softmax)

1:18:46-1:28:58•10m 12sArchitecture

12

Calculating Negative Log Likelihood Loss for Neural Network

1:28:58-1:38:36•9m 38sArchitecture

13

Training the Neural Network with Gradient Descent

1:38:36-1:47:47•9m 11sTraining

14

Equivalence to Bi-gram Model and Regularization as Smoothing

1:47:47-1:54:29•6m 42sConcept

15

Sampling from Neural Net and Future Directions

1:54:29-1:57:45•3m 16sConclusion

Video Details & AI Summary

Published Sep 7, 2022

Analyzed Jan 21, 2026

AI Analysis Summary

This video provides a detailed, 'spelled-out' introduction to language modeling using the 'makemore' project. It begins by building a character-level bi-gram language model through explicit counting and normalization, demonstrating how to sample new words and evaluate model quality using negative log likelihood. The tutorial then transitions to implementing the same bi-gram model within a neural network framework using PyTorch, explaining concepts like one-hot encoding, logits, softmax, and gradient-based optimization, ultimately showing how both approaches yield identical results while highlighting the superior flexibility and scalability of neural networks for future, more complex models like transformers.

Title Accuracy Score

10/10Excellent

41.5s processing

Model:gemini-2.5-flash

Original Video

Watch on YouTube View channel on YouTube