Do you want an AI to analyze images from your security cameras, and never send any data outside your home? This is easy with Home Assistant! I'll show you how to set it up step by step.

I recently created a Home Assistant integration called Ollama Vision. This integration makes it super easy to analyze images locally. Here’s what you need to get started:

Requirements

  • Home Assistant with HACS enabled.
  • Access to image files (via HTTP/HTTPS).
  • A server capable of running Ollama.
  • (Recommended) A GPU for faster inference. I'm using an old NVIDIA RTX 3060 with 12GB VRAM which provides sub-second response times with the right models.

Let’s get started!

Step 1: Install Ollama

Ollama is a great choice for this integration because of its simplicity and flexibility. It supports a variety of vision and text models, making it ideal for local AI-powered image processing.

To install Ollama on your server, run the following command:

curl -fsSL https://ollama.com/install.sh | sh

This follows the official installation instructions. If you don’t have curl or git installed, you may need to install them first:

apt-get install curl git

Once Ollama is installed, verify that it detects your GPU (if applicable). To do this, pull and test a vision-enabled model like moondream:

ollama pull moondream
ollama run moondream

If the model runs successfully, a prompt will appear where you can enter text-based commands. You can test image analysis by running:

ollama run moondream 'Describe this image:' < path/to/your/image.jpg

If your GPU isn’t being used on an NVIDIA system, your drivers may not be set up properly. Check out this guide to troubleshoot GPU issues.

Expose Ollama API to the network

By default, Ollama runs on port 11434, but only listens on localhost. If Home Assistant is running on a different machine, you need to expose the API:

Edit the Ollama systemd service file:

vi /etc/systemd/system/ollama.service

Add the following line to the [Service] section:

Environment="OLLAMA_HOST=0.0.0.0"

Then restart the service:

systemctl daemon-reload
systemctl restart ollama

Now, the API is accessible on your local network using the IP address of the Ollama server. This allows Home Assistant to communicate with Ollama as long as Home Assistant is on the same network as your Ollama server.

Step 2: Install Ollama Vision in Home Assistant

logo_medium   The Ollama Vision Logo

Now that Ollama is set up, let's install the integration.

  1. Ensure you have HACS installed. If not, follow this guide.
  2. Add the custom repository:
    • Open HACS in Home Assistant.
    • Click the three dots in the upper right corner.
    • Select Custom repositories.
    • Paste the repository link: https://github.com/remimikalsen/ollama_vision
    • Select Integration and click Add.
  3. Search for Ollama Vision in HACS and install it.
  4. Restart Home Assistant to apply changes.

Step 3: Configure Ollama Vision

  1. Go to SettingsDevices & Services in Home Assistant.
  2. Click + Add Integration and search for Ollama Vision.
  3. Enter the following details:

    • Name: Give your integration a unique name.
    • Vision Host: The IP address or hostname of your Ollama server.
    • Vision Port: Default is 11434.
    • Vision Model: The model name (default: moondream).
    • Vision Model Keep-Alive: Use -1 to keep it loaded indefinitely.
    • Enable Text Model: Toggle this on if you want an additional text model for enhanced descriptions.
    • Text Model Host: IP/hostname of the optional text model server.
    • Text Model Port: Default is 11434.
    • Text Model: The text model name (default: llama3.1).
    • Text Model Keep-Alive: Use -1 to keep it loaded indefinitely.

Click Submit to save your settings. If you want multiple configurations (e.g., different models for different scenarios), you can repeat this process.

NOTE - you may have to run the following command on your Ollama-server in order to use the llama3.1 model:

ollama pull llama3.1

If you enable a text model and use the model for image analysis, you will add some extra processing time before you get your answer back. Still, using both moondream and llama3.1 on the reference GPU, the RTX 3060, I get responses in less than a second.

Step 4: Automate image analysis in Home Assistant

With Ollama Vision, you can analyze images using the ollama_vision.analyze_image service. Here’s an example automation that describes a person detected by Frigate:

alias: Describe the person outside
description: ""
triggers:
  - platform: mqtt
    topic: frigate/events
conditions:
  - condition: template
    value_template: "{{ trigger.payload_json['after']['label'] == 'person' }}"
  - condition: template
    value_template: "{{ 'front' in trigger.payload_json['after']['entered_zones'] or 'back' in trigger.payload_json['after']['entered_zones'] }}"
  - condition: template
    value_template: >-
      {% set last = state_attr('automation.describe_the_person_outside','last_triggered') %}
      {{ last is none or (now() - last).total_seconds() > 60 }}
actions:
  - service: ollama_vision.analyze_image
    data:
      image_url: "http://<HOME-ASSISTANT-IP>:8123/api/frigate/notifications/{{trigger.payload_json['after']['id']}}/thumbnail.jpg"
      image_name: person_outside
      use_text_model: true
      text_prompt: >-
        You are an AI that introduces people who visit. You are cheeky and love a good roast. Based on the description: <description>{description}</description>, introduce this guest.
      device_id: <YOUR OLLAMA VISION DEVICE ID>

This automation creates or updates a sensor like sensor.ollama_vision_person_outside with the description of the detected person. This will only work if you have set up Frigate to detect objects of type 'person' in zones named 'back' and/or 'front' on one or more of your security cameras. See the full reference configuration for Frigate setup details.

Service parameters

Parameter Required Description
image_url Yes URL of the image to analyze.
image_name Yes Unique identifier for the image (used for sensor naming).
prompt No Prompt for the vision model.
device_id No Specifies the Ollama Vision instance to use.
use_text_model No Enables an additional text model for enhanced descriptions.
text_prompt No Prompt for the text model, referencing {description}.

Conclusion

With Ollama Vision, you can describe security camera snapshots, customize descriptions, and keep everything local. Whether you’re detecting people, reading license plates, or identifying objects, this integration gives you full control over AI-powered image analysis in Home Assistant.

Give it a try!

Previous Post