Ollama Python Library – Tutorial with Examples

The Ollama Python library provides a simple interface to Ollama models. For this purpose, the Ollama Python library uses the Ollama REST API, which allows interaction with different models from the Ollama language model library. Almost all functions provided by the REST API are also provided by the library. It is structured in such a way that it is easy to use even for programming beginners. This blog article mainly describes the use of the chat method of Ollama Python.

Installation
First Usage with Mistral Model and System Message with Ollama Python
Streaming Responses with Ollama Python
Ollama Python – Ongoing Dialogue with Context (Chat-like)
Ollama Python Options – Temperature Option

Installation

To install the ollama python library, you can simply run the following command in your terminal:


pip install ollama

This will download and install the latest version of the Ollama Python library from PyPI (Python Package Index). After installation, you should be able to import it into any Python script or Jupyter notebook.

First Usage with Mistral Model and System Message with Ollama Python

In the following example, we call the Mistral LLM model for the first time using the Ollama Python library. To do this, we need to structure our question as a message object.

A message object contains the message text as a content field and an additional role field. The role field can be either system, user or assistant. Messages with the role assistant are the answers to our questions that the LLM model generates with the help of Ollama.

The first step (optional) is to define a system message as the first message. With the so-called system message, we can set the persona of the assistant (system prompt) according to our needs. In our example below, the system message is You are a helpful assistant with sound philosophical knowledge.

The following call of the Ollama model is a request-response call. This means that we have to wait until the Ollama LLM model has generated the response until we can read the response. Later we will describe another way to read chunk for chunk already during the generation.

Here’s an example of how you might use this library:


# Importing the required library (ollama)
import ollama

# Setting up the model, enabling streaming responses, and defining the input messages
ollama_response = ollama.chat(model='mistral', messages=[
   {
     'role': 'system',
     'content': 'You are a helpful assistant with sound philosophical knowledge.',
   },
   {
     'role': 'user',
     'content': 'Explain to me the meaning of life?',
   },
])
# Printing out of the generated response
print(ollama_response['message']['content'])

In this example, we’re setting up a conversation with an AI assistant that has sound philosophical knowledge. The user then asks about the meaning of life, and the Python Ollama library returns the answer from Mistral model.

Streaming Responses with Ollama Python

In Ollama Python library, you can use the stream parameter to switch on the streaming function of the generated response. This means that the response is already transmitted while it is being generated and you do not have to wait for the response to be fully generated. To do this, set the stream to True and output the response chunk for chunk in a loop.


# Importing the required library (ollama)
import ollama

# Setting up the model, enabling streaming responses, and defining the input messages
ollama_response = ollama.chat(
  model='mistral',
  stream=True,
  messages=[
    {
      'role': 'system',
      'content': 'You are a helpful assistant with sound philosophical knowledge.',
    },
    {
      'role': 'user',
      'content': 'Explain to me the meaning of life?',
    },
  ]
)

# Printing out each piece of the generated response while preserving order
for chunk in ollama_response:
  print(chunk['message']['content'], end='', flush=True)

In this example, we’re setting up a conversation with an AI assistant that has sound philosophical knowledge. The user then asks about the meaning of life, and the library returns the answer from Ollama token by token as it is generated.

Ollama Python – Ongoing Dialogue with Context (Chat-like)

With the chat function of the Ollama Python library, a chat in the form of a continuous dialogue is also possible. For this it is important to store the context and thus also the questions of the user and the answers of the assistant (LLM) in a variable and to pass it to the chat function as context each time.

For the sake of simplicity, we have now divided the code into three functions. The create_message-, chat– and ask-function.

create_message Creates a message object with the associated role
chat Calls the necessary Python ollama python functions
ask Starts the communication process with a new user question

The chat_messages_context array variable contains the current context of the conversation. it is important to extend this with the user’s question and the llm’s answers each time it is called and to transmit it to the llm. this is the only way to secure the context and ensure the dialogue capability of the llm.

Here is a code example:


# Importing the required library (ollama)
import ollama

# Initializing an empty list for storing the chat messages and setting up the initial system message
chat_messages = []
system_message='You are a helpful assistant.'

# Defining a function to create new messages with specified roles ('user' or 'assistant')
def create_message(message, role):
  return {
    'role': role,
    'content': message
  }

# Starting the main conversation loop
def chat():
  # Calling the ollama API to get the assistant response
  ollama_response = ollama.chat(model='mistral', stream=True, messages=chat_messages)

  # Preparing the assistant message by concatenating all received chunks from the API
  assistant_message = ''
  for chunk in ollama_response:
    assistant_message += chunk['message']['content']
    print(chunk['message']['content'], end='', flush=True)
    
  # Adding the finalized assistant message to the chat log
  chat_messages.append(create_message(assistant_message, 'assistant'))

# Function for asking questions - appending user messages to the chat logs before starting the `chat()` function
def ask(message):
  chat_messages.append(
    create_message(message, 'user')
  )
  print(f'\n\n--{message}--\n\n')
  chat()

# Sending two example requests using the defined `ask()` function
ask('Please list the 20 largest cities in the world.')
ask('How many of the cities listed are in South America?')

Ollama Python Options – Temperature Option

The temperature parameter describes the creativity (high value like 1.0) or the conservatism (low value like 0.0) of a generated Olama LLM response. A lower temperature value is especially recommended for use-cases in programming . In the following example, we read a faulty JSON structure as a string variable and have it corrected by the Olama model.

The faulty JSON content looks like:


{
    'name": "John's',
    "age": 30;
    'city': "New York"
    "country": "USA"
}

Below you will find the link to my tutorial on using the Ollama Python library. Here I show with the help of examples how to use models like Mistral to create messages and go into more detail about the chat method, the system message and the streaming option. I also show how to insert the system message and use the streaming option. Finally, how to correct code with options like the temperature parameter.


# Importing the ollama module which is a chatbot API.
import ollama

# Defining a function that reads a file and returns its content. 
def read_file(file):
    # Try-except block for handling FileNotFoundError exception if the file does not exist.
    try:
        with open(file, 'r') as f:
            data = f.read()   # Reading the file content and storing it in variable "data".
        return data  # Returning the read data.
    except FileNotFoundError:
        print("The file could not be found.")

# Calling the function with a filename as argument to get its content.
data = read_file('test-with-errors.json')

print("# Input JSON content with multiple errors")  # Printing the content of the JSON file.
print(data)  # Printing the content of the JSON file.

# Using ollama's chat method for getting a response from the model 'mistral'. The system role is set to help with coding issues, and user provides the data in JSON format that needs fixing. Temperature option is also provided.
ollama_response = ollama.chat(model='mistral', messages=[
  {
    'role': 'system',
    'content': 'You are a helpful coding assistant.',
  },
  {
    'role': 'user',
    'content': f'Please fix the following JSON file content: \n\n {data}',
  },
],
options = {
  #'temperature': 1.5, # very creative
  'temperature': 0 # very conservative (good for coding and correct syntax)
})

# Printing the response from ollama.
print(ollama_response['message']['content'])

The syntactically correct JSON output looks like this:


{
  "name": {
    "first": "John",
    "last": "Doe"
  },
  "age": 30,
  "city": "New York",
  "country": "USA"
}

Table of Contents

Installation

First Usage with Mistral Model and System Message with Ollama Python

Streaming Responses with Ollama Python

Ollama Python – Ongoing Dialogue with Context (Chat-like)

Ollama Python Options – Temperature Option