lucstechblog: Run AI models local

For an index to all my stories click this text.

It is almost unbelievable that the first AI, which was ChatGPT, was released to the public on 30 November 2022. That is (at the moment of this writing) a bit over 3 year ago !!
According to Wikipedia in January 2023 ChatGPT had become the fastest-growing consumer software application in history, gaining over 100 million users in two months. That's just 2 months after the release.

Soon several other AI's have emerged like: Copilot, Claude, Deepseek, Gemini, Grok and Meta.ai
You can click any of the above mentioned AI models and get directed to their webpage where you can try them out.
All these AI systems run on large multi processor computers somewhere in the cloud and cost billions of dollars. And the good part is that most of them can be used for free !!!
You can access these models through your web-browser or by dedicated apps on your smartphone.

Meta.ai is part of the Meta family of applications like Whatsapp, Facebook and Instagram. And Meta has partially made their model open source. This made it possible for companies and individuals to build their own AI systems. And when you look around on the internet you will indeed find several AI systems which are build for specific purposes. There are AI's available for writing and editing text, programming, medical aid and of course porn.

These models are called Llama's. The name Llama stands for Large Language Model Meta AI. In short called LLM's (Large Language Models). These models need hundreds of gigabytes to run.

But some users started to experiment and soon small models were released. And now there are even models that run on the humble Raspberry Pi.
And as I love playing with AI I had to give it a go.

Ollama

Ollama is a website where you can search for LLM's and download them.

To get an AI running on your Pi you need a program. That program is freely available and is called Ollama. You can find it on the Ollama website:

https://ollama.com/

Installing Ollama

Please note that this will only work on a Raspberry Pi4 or Pi5 with at least 8Gb memory.

To install Ollama I advise to start with a new and fresh Raspberry installation.
Preferably start a new install with a full OS install and then use raspi-config to reboot the system in konsole mode. That way the Raspberry has as much free memory available as possible.
For a refresh course on how to setup your Raspberry Pi look here:
https://www.raspberrypi.com/documentation/computers/getting-started.html

Then do an update and upgrade with:

sudo apt update

sudo apt upgrade

Now your Raspberry is fully operational and up to date, you can install Ollama.

To install Ollama enter the next code (just copy and paste):

curl -fsSL https://ollama.com/install.sh | sh

Running this will take some time.

On my Raspberry Pi5 with 8Gb it took about a minute.

After a while the installation finishes.
The last line shows that Ollama did not detect a GPU from NVIDIA or AMD.
This means that all computations will be done by the Raspberry's processor.
A GPU would speed up the processing of commands a lot, but alas I do not have one (yet).

This is just the framework that is needed to process the LLM's.
So the next step is to load the LLM itself.

Finding the right LLM

You need to find LLM's that are smaller then your Pi's memory.
LLM's will be stored on your SD card but can not be run from them.
They need to be loaded into the memory of the Pi to run, otherwise processing is to slow.
So the LLM's you need to look for need to be smaller as the memory of your pi.
In my case they therefore need to be smaller as 8gb.

Start with visiting the Ollama website:

https://ollama.com

In the top right corner of the site click on Models and you'll get directed to a webpage where all LLM models are listed.

As I have played a bit with different Llama's I know from experience that Gemma is a real good start.

Gemma3 is the latest model but unfortunately it is to big for my 8Gb Pi. But Gemma 2 is just right. So at the top of the page in the search function fill in: Gemma

The one we are looking for is gemma2 and that's the second one that comes up.

Click on the name and a screen with the details of the available models shows up.

The smallest model is gemma2:2b. That is 1.6Gb in data and just not "educated" enough. The largest model is gemma2:27B That one is the smartest but needs 16Gb. It would (I presume) fit in a Raspberry Pi5 with 16GB but alas not on my 5Gb version.

So what we need is gemma2:9b which occupies just 5.4Gb and would fit my 8Gb Raspberry Pi5.

At the top of the page it says: ollama run gemma2
That is the general line to start this LLM with ollama. But we need specific the 9b version. So the line should be:

ollama run gemma2:9b

Running the found LLM

Now switch over to the Pi's konsole screen and type the command:

ollama run gemma2:9b

This takes a while because the first time you are going to use this Llama it needs to get downloaded from ollama into your SD card. On my Pi5 this took about 10 minutes.
The next time you run this model it is faster because it is already downloaded to your SD card and only needs to load into the Pi's memory.

The line at the bottom: >>> Send a message (/? for help)
indicates that the model has loaded into your Pi's memory and is ready for usage.

You can type your questions direct after the three arrows (>>>)

The first test.

Let's try something simple. Let's ask the AI for Pi with 50 decimals.

And there it is. The figures are put on the screen one by one and the answer took about a minute to complete.

Here is another example.

I asked gemma to explain gravity in 5 lines of text, and it did a great job.
If you would omit the 5 lines of text limit you would get an enormous amount of text including Newtons laws, Einsteins theories etc. Trust me I have tried it.

I also asked gemma to explain gravity to me and act like I am 5 year old. And above is the answer.

Can it program in MicroPython

I wanted to see if gemma could help with some programming my Raspberry Pi Pico. So I asked it to write a small program that would blink the internal led for 10 seconds on and 10 seconds off.

And what do you say to that.

Stopping ollama

If you want to stop a conversation just use the clear command like this:

>>>/clear

And if you want to quit ollama use /exit like this

>>>/exit

This last command brings you back to the system prompt.

Only gemma2 ???

Well no, you can load as many Llama's as you like as long as there is room on your SD card and they are in size smaller as the memory of your Pi.
But as they use up all the Pi's memory you can not run more as one model at the time.

I have for example been playing with Deepseek-r1:8b which was no success.

But dolphin-llama3:8b works great. This model is more language oriented and is uncensored.
You can run it with:

ollama run dolphin-llama3:8b

Please note that just like gemma2 this model has to download first so that takes some time.

And at this moment my favorite is mistral-openorca:7b
You can run this with:

ollama run mistral-openorca:7b

To test how much memory there is left on your SD card use df.

The example shows that my SD card is 64Gb and there is still about 39Gb available.
So I can download a lot of other models to play with.

If you want to know what models already have been downloaded on the sd card just use

ollama list

As you can see I have downloaded 5 different models. And I still have room left on my SD card for experimenting with other models.

Concluding.

Running AI on your Pi is simple and fun to play with.
Just use models that are well below 8Gb.
And remember the larger the models the more educated they are.
So you can not compare a local 5Gb model to Chatgpt or Gemini that runs on billiondollar computer systems.
Nevertheless it's fun to play with, and costs nothing more as a simple SD card.

Oh and did I say it is fun to play with ???
Really, give it a try yourself, and be amazed.

Till next time
have fun

Luc Volders

Friday, January 16, 2026

Run AI models local