Local chatbot pipeline on Jetson Orin Nano Developer Kit via HA Pipeline

Matt · January 5, 2026, 1:58pm

Yes, it’s entirely feasible to run a fully offline, local chatbot pipeline (STT > LLM > TTS) on your Jetson Orin Nano Developer Kit using Whisper for speech-to-text, a lightweight LLM for response generation, and Piper for text-to-speech. The Orin Nano’s 8GB shared memory and GPU can handle this with optimized, quantized models (e.g., 1-7B parameter LLMs at 4-8 bits), keeping latency reasonable (typically 1-5 seconds end-to-end for short queries). Several open-source frameworks support this setup, leveraging dusty-nv’s jetson-containers for easy deployment. Below, I’ll outline the most straightforward options based on your existing setup (e.g., wyoming-piper already running) and community-validated examples.

Recommended Approach: Home Assistant with Wyoming Protocol (Easiest Integration with Your Current Setup)

Home Assistant (HA) provides a flexible voice assistant pipeline via its Assist feature, which chains STT, LLM/intent handling, and TTS. It’s lightweight enough for the Orin Nano when run in Docker, and integrates seamlessly with jetson-containers’ Wyoming servers for GPU-accelerated Whisper and Piper. This runs fully locally/offline after setup.

Key Components

STT: wyoming-whisper (uses faster-whisper, a CUDA-optimized Whisper implementation; models like tiny.en or base.en fit easily, with ~0.5-1s latency).
LLM: Use Ollama (e.g., with Qwen2.5:1.5B or Llama-3.2:1B quantized to 4 bits) or dusty-nv’s nano_llm (supports Mistral-7B, Phi-3, etc., via TensorRT for faster inference).
TTS: Your existing wyoming-piper (high-quality voices like en-us-lessac-medium run in real-time).

Setup Steps

Install Home Assistant: Run HA in a Docker container on your Jetson (JetPack 6.2+ recommended). Use the official ARM64 image:
```
docker run -d --name home-assistant --restart=unless-stopped -e TZ=Your/Timezone -v /path/to/ha/config:/config --network=host homeassistant/home-assistant:stable
```
Access the UI at http://<jetson-ip>:8123 and complete onboarding.
Run Wyoming Servers via jetson-containers:
- For Whisper: jetson-containers run $(autotag wyoming-whisper)
  - This starts a server on port 10300 (host networking, no explicit port mapping needed).
- Piper is already running (on port 10200).
Set Up LLM:
- Run Ollama in a container: jetson-containers run $(autotag ollama)
  - Pull a small model: ollama pull qwen2.5:1.5b (or similar; ~1GB quantized).
- Alternatively, for better Jetson optimization: jetson-containers run $(autotag nano_llm) and load a model like Mistral-7B via its API.
Configure HA Voice Pipeline:
- In HA UI: Go to Settings > Voice Assistants > Add Assistant.
- Set up a pipeline: STT = Wyoming Whisper (point to tcp://localhost:10300), Conversation/Intent = Ollama or custom LLM integration (via HA’s OpenAI-compatible API support), TTS = Wyoming Piper (tcp://localhost:10200).
- Add a wake word (e.g., via wyoming-openwakeword if desired: jetson-containers run $(autotag wyoming-openwakeword)).
- Test via HA’s Debug Pipeline tool.
Run the Chatbot:
- Use a microphone (e.g., USB or CSI camera mic) and speakers connected to the Jetson.
- Speak to trigger the pipeline; HA handles the flow and outputs audio.
- For a standalone feel, expose it via HA’s Expose feature or integrate with a simple script/app.

Resource Notes

Total memory: ~4-6GB peak (Whisper tiny: 0.5GB, LLM 1.5B-q4: 1-2GB, Piper: 0.2GB, HA: 1GB).
Tune for performance: Use smaller models if latency spikes; enable TensorRT in nano_llm for 2-3x faster LLM inference.
Challenges: Initial setup may require debugging audio devices (use arecord -l and --device /dev/snd in containers).

This leverages your wyoming-piper and is modular—swap components easily.

Alternative: Open Voice OS (Standalone Voice Assistant)

If you prefer a dedicated voice OS without HA’s smart home focus, use Open Voice OS (OVOS). It’s demoed running offline on the Orin Nano 8GB with K3s (lightweight Kubernetes) for orchestration.

Key Components

STT: Whisper.
LLM: Ollama with Qwen2.5:1.5B (or similar small model).
TTS: Piper.

Setup Steps

Install K3s and Rancher: Follow standard Ubuntu 22.04/JetPack 6.2 setup, then install K3s: curl -sfL https://get.k3s.io | sh -.
- Install Rancher for management: Follow official docs.
Deploy OVOS Stack: Use OVOS Docker images or YAML manifests for pods like ovos-core, stt-whisper, tts-piper, and ollama.
- Pull models: e.g., ollama pull qwen2.5:1.5b.
- Configure skills (e.g., chat, weather) via OVOS config.
Run and Test: Access via mic/speakers; demo shows conversational responses.

See the NVIDIA forum demo video for visuals. Resource usage is similar to HA (~5GB), fully offline.

Alternative: NanoLLM with LlamaSpeak (If You Prefer Riva Over Whisper)

For a web-based interface, use dusty-nv’s NanoLLM voice agent (e.g., LlamaSpeak). It supports Piper TTS and can be adapted for Whisper STT (via --asr=whisper if available, or custom integration), with nano_llm for the LLM.

Quick Setup

Run: jetson-containers run $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper
Access at https://<jetson-ip>:8050; speak via browser mic.
Adapt for Whisper: Check NanoLLM docs for ASR options or chain with wyoming-whisper.

This runs well on Orin Nano with 4-bit models.

General Tips

Audio Hardware: Ensure mic/speakers work (test with arecord/aplay); pass devices to containers (e.g., --device /dev/snd).
Optimization: Quantize models (e.g., via Hugging Face); monitor with jtop.
Customization: Add VAD/hotwords for hands-free use; extend to RAG for document querying as in your intent.
Troubleshooting: Check logs (docker logs <container>); community forums like NVIDIA Developer or Reddit’s r/JetsonNano have similar setups.

Start with the HA approach—it’s the least friction given your setup. If you hit issues, provide more details like error logs.