Local chatbot pipeline on Jetson Orin Nano Developer Kit via HA Pipeline

Yes, it’s entirely feasible to run a fully offline, local chatbot pipeline (STT > LLM > TTS) on your Jetson Orin Nano Developer Kit using Whisper for speech-to-text, a lightweight LLM for response generation, and Piper for text-to-speech. The Orin Nano’s 8GB shared memory and GPU can handle this with optimized, quantized models (e.g., 1-7B parameter LLMs at 4-8 bits), keeping latency reasonable (typically 1-5 seconds end-to-end for short queries). Several open-source frameworks support this setup, leveraging dusty-nv’s jetson-containers for easy deployment. Below, I’ll outline the most straightforward options based on your existing setup (e.g., wyoming-piper already running) and community-validated examples.

Recommended Approach: Home Assistant with Wyoming Protocol (Easiest Integration with Your Current Setup)

Home Assistant (HA) provides a flexible voice assistant pipeline via its Assist feature, which chains STT, LLM/intent handling, and TTS. It’s lightweight enough for the Orin Nano when run in Docker, and integrates seamlessly with jetson-containers’ Wyoming servers for GPU-accelerated Whisper and Piper. This runs fully locally/offline after setup.

Key Components

  • STT: wyoming-whisper (uses faster-whisper, a CUDA-optimized Whisper implementation; models like tiny.en or base.en fit easily, with ~0.5-1s latency).
  • LLM: Use Ollama (e.g., with Qwen2.5:1.5B or Llama-3.2:1B quantized to 4 bits) or dusty-nv’s nano_llm (supports Mistral-7B, Phi-3, etc., via TensorRT for faster inference).
  • TTS: Your existing wyoming-piper (high-quality voices like en-us-lessac-medium run in real-time).

Setup Steps

  1. Install Home Assistant: Run HA in a Docker container on your Jetson (JetPack 6.2+ recommended). Use the official ARM64 image:

    docker run -d --name home-assistant --restart=unless-stopped -e TZ=Your/Timezone -v /path/to/ha/config:/config --network=host homeassistant/home-assistant:stable
    

    Access the UI at http://<jetson-ip>:8123 and complete onboarding.

  2. Run Wyoming Servers via jetson-containers:

    • For Whisper: jetson-containers run $(autotag wyoming-whisper)
      • This starts a server on port 10300 (host networking, no explicit port mapping needed).
    • Piper is already running (on port 10200).
  3. Set Up LLM:

    • Run Ollama in a container: jetson-containers run $(autotag ollama)
      • Pull a small model: ollama pull qwen2.5:1.5b (or similar; ~1GB quantized).
    • Alternatively, for better Jetson optimization: jetson-containers run $(autotag nano_llm) and load a model like Mistral-7B via its API.
  4. Configure HA Voice Pipeline:

    • In HA UI: Go to Settings > Voice Assistants > Add Assistant.
    • Set up a pipeline: STT = Wyoming Whisper (point to tcp://localhost:10300), Conversation/Intent = Ollama or custom LLM integration (via HA’s OpenAI-compatible API support), TTS = Wyoming Piper (tcp://localhost:10200).
    • Add a wake word (e.g., via wyoming-openwakeword if desired: jetson-containers run $(autotag wyoming-openwakeword)).
    • Test via HA’s Debug Pipeline tool.
  5. Run the Chatbot:

    • Use a microphone (e.g., USB or CSI camera mic) and speakers connected to the Jetson.
    • Speak to trigger the pipeline; HA handles the flow and outputs audio.
    • For a standalone feel, expose it via HA’s Expose feature or integrate with a simple script/app.

Resource Notes

  • Total memory: ~4-6GB peak (Whisper tiny: 0.5GB, LLM 1.5B-q4: 1-2GB, Piper: 0.2GB, HA: 1GB).
  • Tune for performance: Use smaller models if latency spikes; enable TensorRT in nano_llm for 2-3x faster LLM inference.
  • Challenges: Initial setup may require debugging audio devices (use arecord -l and --device /dev/snd in containers).

This leverages your wyoming-piper and is modular—swap components easily.

Alternative: Open Voice OS (Standalone Voice Assistant)

If you prefer a dedicated voice OS without HA’s smart home focus, use Open Voice OS (OVOS). It’s demoed running offline on the Orin Nano 8GB with K3s (lightweight Kubernetes) for orchestration.

Key Components

  • STT: Whisper.
  • LLM: Ollama with Qwen2.5:1.5B (or similar small model).
  • TTS: Piper.

Setup Steps

  1. Install K3s and Rancher: Follow standard Ubuntu 22.04/JetPack 6.2 setup, then install K3s: curl -sfL https://get.k3s.io | sh -.

    • Install Rancher for management: Follow official docs.
  2. Deploy OVOS Stack: Use OVOS Docker images or YAML manifests for pods like ovos-core, stt-whisper, tts-piper, and ollama.

    • Pull models: e.g., ollama pull qwen2.5:1.5b.
    • Configure skills (e.g., chat, weather) via OVOS config.
  3. Run and Test: Access via mic/speakers; demo shows conversational responses.

See the NVIDIA forum demo video for visuals. Resource usage is similar to HA (~5GB), fully offline.

Alternative: NanoLLM with LlamaSpeak (If You Prefer Riva Over Whisper)

For a web-based interface, use dusty-nv’s NanoLLM voice agent (e.g., LlamaSpeak). It supports Piper TTS and can be adapted for Whisper STT (via --asr=whisper if available, or custom integration), with nano_llm for the LLM.

Quick Setup

  • Run: jetson-containers run $(autotag nano_llm) python3 -m nano_llm.agents.web_chat --api=mlc --model meta-llama/Meta-Llama-3-8B-Instruct --asr=riva --tts=piper
  • Access at https://<jetson-ip>:8050; speak via browser mic.
  • Adapt for Whisper: Check NanoLLM docs for ASR options or chain with wyoming-whisper.

This runs well on Orin Nano with 4-bit models.

General Tips

  • Audio Hardware: Ensure mic/speakers work (test with arecord/aplay); pass devices to containers (e.g., --device /dev/snd).
  • Optimization: Quantize models (e.g., via Hugging Face); monitor with jtop.
  • Customization: Add VAD/hotwords for hands-free use; extend to RAG for document querying as in your intent.
  • Troubleshooting: Check logs (docker logs <container>); community forums like NVIDIA Developer or Reddit’s r/JetsonNano have similar setups.

Start with the HA approach—it’s the least friction given your setup. If you hit issues, provide more details like error logs.