Do you want an AI to analyze images from your security cameras, and never send any data outside your home? This is easy with Home Assistant! I'll show you how to set it up step by step.
I recently created a Home Assistant integration called Ollama Vision. This integration makes it super easy to analyze images locally. Here’s what you need to get started:
Let’s get started!
Ollama is a great choice for this integration because of its simplicity and flexibility. It supports a variety of vision and text models, making it ideal for local AI-powered image processing.
To install Ollama on your server, run the following command:
curl -fsSL https://ollama.com/install.sh | sh
This follows the official installation instructions. If you don’t have curl
or git
installed, you may need to install them first:
apt-get install curl git
Once Ollama is installed, verify that it detects your GPU (if applicable). To do this, pull and test a vision-enabled model like moondream
:
ollama pull moondream
ollama run moondream
If the model runs successfully, a prompt will appear where you can enter text-based commands. You can test image analysis by running:
ollama run moondream 'Describe this image:' < path/to/your/image.jpg
If your GPU isn’t being used on an NVIDIA system, your drivers may not be set up properly. Check out this guide to troubleshoot GPU issues.
By default, Ollama runs on port 11434, but only listens on localhost
. If Home Assistant is running on a different machine, you need to expose the API:
Edit the Ollama systemd service file:
vi /etc/systemd/system/ollama.service
Add the following line to the [Service]
section:
Environment="OLLAMA_HOST=0.0.0.0"
Then restart the service:
systemctl daemon-reload
systemctl restart ollama
Now, the API is accessible on your local network using the IP address of the Ollama server. This allows Home Assistant to communicate with Ollama as long as Home Assistant is on the same network as your Ollama server.
The Ollama Vision Logo
Now that Ollama is set up, let's install the integration.
https://github.com/remimikalsen/ollama_vision
Enter the following details:
11434
.moondream
).-1
to keep it loaded indefinitely.11434
.llama3.1
).-1
to keep it loaded indefinitely.Click Submit to save your settings. If you want multiple configurations (e.g., different models for different scenarios), you can repeat this process.
NOTE - you may have to run the following command on your Ollama-server in order to use the llama3.1 model:
ollama pull llama3.1
If you enable a text model and use the model for image analysis, you will add some extra processing time before you get your answer back. Still, using both moondream
and llama3.1
on the reference GPU, the RTX 3060, I get responses in less than a second.
With Ollama Vision, you can analyze images using the ollama_vision.analyze_image
service. Here’s an example automation that describes a person detected by Frigate:
alias: Describe the person outside
description: ""
triggers:
- platform: mqtt
topic: frigate/events
conditions:
- condition: template
value_template: "{{ trigger.payload_json['after']['label'] == 'person' }}"
- condition: template
value_template: "{{ 'front' in trigger.payload_json['after']['entered_zones'] or 'back' in trigger.payload_json['after']['entered_zones'] }}"
- condition: template
value_template: >-
{% set last = state_attr('automation.describe_the_person_outside','last_triggered') %}
{{ last is none or (now() - last).total_seconds() > 60 }}
actions:
- service: ollama_vision.analyze_image
data:
image_url: "http://<HOME-ASSISTANT-IP>:8123/api/frigate/notifications/{{trigger.payload_json['after']['id']}}/thumbnail.jpg"
image_name: person_outside
use_text_model: true
text_prompt: >-
You are an AI that introduces people who visit. You are cheeky and love a good roast. Based on the description: <description>{description}</description>, introduce this guest.
device_id: <YOUR OLLAMA VISION DEVICE ID>
This automation creates or updates a sensor like sensor.ollama_vision_person_outside
with the description of the detected person. This will only work if you have set up Frigate to detect objects of type 'person' in zones named 'back' and/or 'front' on one or more of your security cameras. See the full reference configuration for Frigate setup details.
Parameter | Required | Description |
---|---|---|
image_url |
Yes | URL of the image to analyze. |
image_name |
Yes | Unique identifier for the image (used for sensor naming). |
prompt |
No | Prompt for the vision model. |
device_id |
No | Specifies the Ollama Vision instance to use. |
use_text_model |
No | Enables an additional text model for enhanced descriptions. |
text_prompt |
No | Prompt for the text model, referencing {description} . |
With Ollama Vision, you can describe security camera snapshots, customize descriptions, and keep everything local. Whether you’re detecting people, reading license plates, or identifying objects, this integration gives you full control over AI-powered image analysis in Home Assistant.
Give it a try!