Run Local LLMs with ollama on Windows

Quickstart Instructions for doze:

download ollama (e.g. https://ollama.com/download)
Check if the LLM you want to use is already on https://ollama.com/library/ and run: ollama pull deepseek-r1
That’s it! execute to begin chatting: ollama run deepseek-r1

Manual/GGUF Approach:

Alternatively, find the GGUF file for the model you want (e.g. https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf/blob/main/dolphin-2.9.4-llama3.1-8b-Q5_K_M.gguf)
Create instructions for the model (snippet example below) and save it (e.g. dolphin.llama3.ModelInfo)

FROM ./dolphin-2.9.4-llama3.1-8b-Q5_K_M.gguf
TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM """You are Dolphin, an uncensored, unbiased AI assistant"""
PARAMETER stop <|im_end|>

4. Create using the model file, run: ollama create dolphin-llama3 -f dolphin.llama3.ModelInfo and you should see “transferring model data” then some hashes and manifest data and a final “success”

5. then to send text tokens to the model, run: ollama run dolphin-llama3 “hello dolphin llama”

Alternatively as of Aug 2024, you can also: ollama run dolphin-llama3 and get a looping input prompt to carry out a conversation.

Enjoy!

Longer tldr; more verbose notes:

The above notes should work just fine for you, but the rest of this article is a bit more heady. In the prudent sense of the word..

In this example I used a dolphin version of llama3 but these instructions should work with any GGUF you have. I’ve used these same steps with numerous other models successfully.

I’ll consider a more lengthy post in the future on how to choose the model for the task you want. I’m running some analysis and parsing of political data and the default restrictions on the standard version of llama3 limit the types of prompts and parsing I need for this use case. YMMV

ollama install is environment specific, on windows I used UIs to download. Download GGUF for the model you want from command line or navigate and download from your browser.

If you use huggingface like much of the world in Aug 2024, then you have to go into the “files and versions” tab to see the GGUF files and then click on one of the files to see the direct link where there will be a download.

Or if you happen to know these details another way… (e.g. copy pasting from my blog..) then you can then kick off the DL in your CLI also.. gitlfs also might be better but wget is like a universal jar opener in my toolbelt. I’ll try gitlfs next time.

> wget https://huggingface.co/cognitivecomputations/dolphin-2.9.4-llama3.1-8b-gguf/resolve/main/dolphin-2.9.4-llama3.1-8b-Q5_K_M.gguf

To confirm it installed successfully you can check in your browser or wget the port the ollama service runs on:

> wget http://localhost:11434

StatusCode        : 200
StatusDescription : OK
Content           : Ollama is running
RawContent        : HTTP/1.1 200 OK
                    Content-Length: 17
                    Content-Type: text/plain; charset=utf-8
                    Date: Sat, 24 Aug 2024 23:45:44 GMT

                    Ollama is running
Forms             : {}
Headers           : {[Content-Length, 17], [Content-Type, text/plain; charset=utf-8], [Date, Sat, 24 Aug 2024 23:45:44
                    GMT]}
Images            : {}
InputFields       : {}
Links             : {}
ParsedHtml        : System.__ComObject
RawContentLength  : 17

I created the model file in notepad because I’m old skool. Just kidding, notepad++/sublime/vscode are great if you need syntax highlighting etc but honestly this syntax is so simple that doesn’t matter. FWIW, my goto on *nix is vim FTW. 8)

> notepad.exe dolphin.llama3.ModelInfo

FROM ./dolphin-2.9.4-llama3.1-8b-Q5_K_M.gguf
TEMPLATE """{{ if and .First .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM """You are Dolphin, an uncensored, unbiased AI assistant"""
PARAMETER stop <|im_end|>

Then have ollama work its magic and you’ll see some output. (I didn’t spend the time yet to see what these are hashes of but I’m curious..)

> ollama create dolphin-llama3 -f dolphin.llama3.ModelInfo
transferring model data 100%
using existing layer sha256:7aab5b264a54f74cdaeeab5cc8522be4ff7bacf4acff072f9285bc014e1567fc
using existing layer sha256:26beb05aced8f201a4ea08258b0884dc7871dce2251ea5c8e3645dfb0120a328
creating new layer sha256:cec68c0a837dd277691a42e5e1a8785cab222fe867fae3f5c737c6edf477b54e
using existing layer sha256:bf328696c54fb5bd2cc49838d5ef407545504cfdd4faa6ff3a6fbeeffe63435c
creating new layer sha256:46a27ee588576807c3c757f0c673f07b1d99acdd07078fd69af5a29a67a37c4c
writing manifest
success

To start chatting you can send single prompts in double quotes or use the looping input mode.

> ollama run dolphin-llama3 "hello dolphin llama"
pulling manifest
pulling ea025c107c1c... 100% ▕████████████████████████████████████████████████████████▏ 4.7 GB
pulling 4fa551d4f938... 100% ▕████████████████████████████████████████████████████████▏  12 KB
pulling 62fbfd9ed093... 100% ▕████████████████████████████████████████████████████████▏  182 B
pulling 9640c2212a51... 100% ▕████████████████████████████████████████████████████████▏   41 B
pulling f02dd72bb242... 100% ▕████████████████████████████████████████████████████████▏   59 B
pulling 312e64921670... 100% ▕████████████████████████████████████████████████████████▏  557 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Hello there! It seems like you're trying to engage me in conversation by creating a unique nickname. That's fun!
I'm here to help you with any questions or tasks you have. What can I do for you today?

Example of looping mode for more conversational chatting:

> ollama run dolphin-llama3

>>> do you prefer dolphins or llamas?
As an AI created to provide assistance and engage in conversation, I don't have personal preferences like humans
do. However, I can tell you that both dolphins and llamas are fascinating animals with unique characteristics.
Dolphins are highly intelligent and social creatures, while llamas are gentle and sociable animals often
domesticated by humans for their wool and pack carrying abilities.

>>> so you are saying that dolphins and llamas are both cool right?!
Indeed! Both dolphins and llamas have their own unique appeal and contribute to the rich biodiversity of our
planet. Dolphins are known for their playful behavior, high intelligence, and ability to communicate using a
complex system of clicks and whistles. Llamas, on the other hand, are valued for their wool, meat, and as pack
animals in the Andes Mountains region.

Great responses dolphin llama! Surf’s up!

Alternatives

I also took a look at some other Windows tools out there besides ollama. Gpt4all ran very slowly on my machine with some of the larger models despite top 1% hardware specs on tomshardware. The downloads also seemed to be capped and 4x longer than if I downloaded with my browser or CLI. This may have been a matter of configuration, but I prefer the simplicity of interface through CLI which was faster and cleaner right out of the box with ollama, so chalk it up to personal preference if you like fancy UIs.

On that note there is a webUI for ollama which provides an experience more similar to some of the other web tools/chatbox interfaces you’ve seen out there which is also akin to Gpt4all. I didn’t use the webUI but the reference link below for Eric Hartford’s blog is like a more extensive version of this article and describes the setup if you want to try it.

Performance

The title of this article says “run local LLMs” but if you stumbled across this article without background context on the hardware implications you could be surprised on the performance depending on your specs.

In this case if you don’t want to run it locally and you want to use a remote server or other chained/linked pipeline, huggingface has a nice interface and approach that requires an account and copy pasting some commands and syntax. For this approach I linked in references below from meta’s article which is for llama3 but essentially the same instructions apply to other models you want to use.

So now that you have your local LLM running, you can start chatting away! If that was your end goal, it’s time for a glass of {milkVariant} and an oreo. (disclaimer: habit forming).

..Or maybe you have bigger plans? So what’s next?

Project Use Case(s)/Requirements Considerations

If you are looking to create a workflow or chain of actions using the LLM, there are “no-code” solutions out there, but I don’t have any affiliate marketing with them at time of this writing or I would recommend one to ya. 😀

Instead, I’ll recommend to at least follow some of the simple coding blogs to get an understanding of how some of those other chat/LLM workflow tools are built. Learning the fundamentals will always provide you with, well… good fundamentals! In my personal epistemology I think this is always the best foundation, but I digress..

The meta website has a nice walkthrough linked in the references below which will introduce you to some new concepts like langchains. For these guides, you’ll probably want to have something a little more than notepad, but honestly the initial copy pasting from articles is simple enough that it’s very optional and likely won’t slow you down unless you plan on doing more coding later.

If you decide to continue down the path with python/pytorch etc, you can still use ollama python API or you can use remote APIs like hugging face. If your project requirements only need you and/or a few users to run this workflow, then your throughput, availability and latency requirements might not be a big factor, but if you’re building something for many users that’s an entirely separate article to dive deep on. (perhaps for another rainy Seattle day)

Whether you just want to chat, have something to test with while you create your LLM workflow(s), or looking to get started coding the next famous chatbot, ollama is great new addition to have in your LLM toolbelt.

References:

{handshake sound}

Run Local LLMs with ollama on Windows

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply