Andrej Karpathy did a self-contained 200-line microgpt.py in Python that trains and makes inference from the GPT-2 model, the second generation of the model behind ChatGPT.

The Karpathy’s version only works with scalar values, every coefficient one by one in the artificial neural network. We changed it a few times and timed it:

version description number of lines (the fewer the simpler) run time (the lower the better) optimised loss (the lower the better)
scalar the original version, does math one by one   160s 2.66
vector uses vector extension on the microprocessor to do math concurrently 164 7s 2.63
rnn changes to vanilla recurrent network, and uses vector extension on hardware 118 1.2s 2.35
rnn_att as above but at each step, sparsely select coordinates on the state vector to transform 119 1.2s 2.44

The code is in the scalar, vector, rnn, rnn_att branches respectively for the experiments. One will have to clone the vector-capable micrograd repository and check out the att branch to work with it.

mkdir src; cd src
git clone https://github.com/jli05/microgpt
git clone https://github.com/brief-ds/micrograd
cd microgpt
python3 -m venv venv
. venv/bin/activate
cd ../micrograd && git checkout att && pip3 install . && cd ../microgpt

git checkout scalar
time python3 microgpt.py

git checkout vector
time python3 microgpt.py

# ...

References

microgpt blog post, https://karpathy.github.io/2026/02/12/microgpt/

Karpathy’s microgpt, only working on scalar values, https://gist.github.com/karpathy/8627fe009c40f57531cb18360106ce95

our microgpt, with four versions scalar, vector, rnn and rnn_att in the respective branches, https://github.com/jli05/microgpt

our tensor-capable micrograd, whose att branch powers the study in this post, https://github.com/brief-ds/micrograd