Yes, it’s entirely feasible to run a fully offline, local chatbot pipeline (STT > LLM > TTS) on your Jetson Orin Nano Developer Kit using Whisper for speech-to-text, a lightweight LLM for response generation, and Piper for text-to-speech. The Orin Nano’s 8GB shared memory and GPU can handle this with optimized, quantized models (e.g., 1-7B parameter LLMs at 4-8 bits), keeping latency reasonable (typically 1-5 seconds end-to-end for short queries). Several open-source frameworks support this setup, leveraging dusty-nv’s jetson-containers for easy deployment. Below, I’ll outline the most straightforward options based on your existing setup (e.g., wyoming-piper already running) and community-validated examples.
Recommended Approach: Home Assistant with Wyoming Protocol (Easiest Integration with Your Current Setup)
Home Assistant (HA) provides a flexible voice assistant pipeline via its Assist feature, which chains STT, LLM/intent handling, and TTS. It’s lightweight enough for the Orin Nano when run in Docker, and integrates seamlessly with jetson-containers’ Wyoming servers for GPU-accelerated Whisper and Piper. This runs fully locally/offline after setup.
Key Components
- STT: wyoming-whisper (uses faster-whisper, a CUDA-optimized Whisper implementation; models like
tiny.enorbase.enfit easily, with ~0.5-1s latency). - LLM: Use Ollama (e.g., with Qwen2.5:1.5B or Llama-3.2:1B quantized to 4 bits) or dusty-nv’s nano_llm (supports Mistral-7B, Phi-3, etc., via TensorRT for faster inference).
- TTS: Your existing wyoming-piper (high-quality voices like
en-us-lessac-mediumrun in real-time).
Setup Steps
-
Install Home Assistant: Run HA in a Docker container on your Jetson (JetPack 6.2+ recommended). Use the official ARM64 image:
docker run -d --name home-assistant --restart=unless-stopped -e TZ=Your/Timezone -v /path/to/ha/config:/config --network=host homeassistant/home-assistant:stableAccess the UI at
http://<jetson-ip>:8123and complete onboarding. -
Run Wyoming Servers via jetson-containers:
- For Whisper:
jetson-containers run $(autotag wyoming-whisper)- This starts a server on port 10300 (host networking, no explicit port mapping needed).
- Piper is already running (on port 10200).
- For Whisper:
-
Set Up LLM:
- Run Ollama in a container:
jetson-containers run $(autotag ollama)- Pull a small model:
ollama pull qwen2.5:1.5b(or similar; ~1GB quantized).
- Pull a small model:
- Alternatively, for better Jetson optimization:
jetson-containers run $(autotag nano_llm)and load a model like Mistral-7B via its API.
- Run Ollama in a container:
-
Configure HA Voice Pipeline:
- In HA UI: Go to Settings > Voice Assistants > Add Assistant.
- Set up a pipeline: STT = Wyoming Whisper (point to
tcp://localhost:10300), Conversation/Intent = Ollama or custom LLM integration (via HA’s OpenAI-compatible API support), TTS = Wyoming Piper (tcp://localhost:10200). - Add a wake word (e.g., via wyoming-openwakeword if desired:
jetson-containers run $(autotag wyoming-openwakeword)). - Test via HA’s Debug Pipeline tool.
-
Run the Chatbot:
- Use a microphone (e.g., USB or CSI camera mic) and speakers connected to the Jetson.
- Speak to trigger the pipeline; HA handles the flow and outputs audio.
- For a standalone feel, expose it via HA’s Expose feature or integrate with a simple script/app.
Resource Notes
- Total memory: ~4-6GB peak (Whisper tiny: 0.5GB, LLM 1.5B-q4: 1-2GB, Piper: 0.2GB, HA: 1GB).
- Tune for performance: Use smaller models if latency spikes; enable TensorRT in nano_llm for 2-3x faster LLM inference.
- Challenges: Initial setup may require debugging audio devices (use
arecord -land--device /dev/sndin containers).
This leverages your wyoming-piper and is modular—swap components easily.
Alternative: Open Voice OS (Standalone Voice Assistant)
If you prefer a dedicated voice OS without HA’s smart home focus, use Open Voice OS (OVOS). It’s demoed running offline on the Orin Nano 8GB with K3s (lightweight Kubernetes) for orchestration.
Key Components
- STT: Whisper.
- LLM: Ollama with Qwen2.5:1.5B (or similar small model).
- TTS: Piper.
Setup Steps
-
Install K3s and Rancher: Follow standard Ubuntu 22.04/JetPack 6.2 setup, then install K3s:
curl -sfL https://get.k3s.io | sh -.- Install Rancher for management: Follow official docs.
-
Deploy OVOS Stack: Use OVOS Docker images or YAML manifests for pods like
ovos-core,stt-whisper,tts-piper, andollama.- Pull models: e.g.,
ollama pull qwen2.5:1.5b. - Configure skills (e.g., chat, weather) via OVOS config.
- Pull models: e.g.,
-
Run and Test: Access via mic/speakers; demo shows conversational responses.
See the NVIDIA forum demo video for visuals. Resource usage is similar to HA (~5GB), fully offline.
Alternative: NanoLLM with LlamaSpeak (If You Prefer Riva Over Whisper)
For a web-based interface, use dusty-nv’s NanoLLM voice agent (e.g., LlamaSpeak). It supports Piper TTS and can be adapted for Whisper STT (via --asr=whisper if available, or custom integration), with nano_llm for the LLM.
Quick Setup
- Run:
jetson-containers run $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper - Access at
https://<jetson-ip>:8050; speak via browser mic. - Adapt for Whisper: Check NanoLLM docs for ASR options or chain with wyoming-whisper.
This runs well on Orin Nano with 4-bit models.
General Tips
- Audio Hardware: Ensure mic/speakers work (test with
arecord/aplay); pass devices to containers (e.g.,--device /dev/snd). - Optimization: Quantize models (e.g., via Hugging Face); monitor with
jtop. - Customization: Add VAD/hotwords for hands-free use; extend to RAG for document querying as in your intent.
- Troubleshooting: Check logs (
docker logs <container>); community forums like NVIDIA Developer or Reddit’s r/JetsonNano have similar setups.
Start with the HA approach—it’s the least friction given your setup. If you hit issues, provide more details like error logs.