Holler is a cross-platform, memory-efficient push-to-talk dictation app — a walkie-talkie for your agents. Release the key and your speech is transcribed, injected at the cursor, copied to the clipboard, and saved to a searchable local history. It reads text back aloud, too.
Native, lightweight, and privacy-first — with pluggable providers you swap by config.
Hold ⌃⌥Space, speak, release. Text lands at your cursor, on the clipboard, and in history.
Pluggable transcription — Deepgram (nova-3) and OpenAI, selectable per config.
Read your selection or clipboard aloud — offline native voice or cloud OpenAI / Deepgram, with replay & stop.
A recording pill with a live level meter, a read-aloud status popup, and a clipboard toast — CPU-rendered, no WebView.
An on-demand window to pick providers, voices, injection mode, and rebind hotkeys live — no restart.
Every transcript saved to a local SQLite database that you own and can search.
API keys live in a separate secrets.toml (0600), never in your shareable config. Env overrides supported.
mimalloc, models loaded only during a session, event-driven hotkeys, LTO + strip release builds.
macOS & Windows today — paste-or-type injection with a graceful clipboard fallback.
Paste a wall of text and it just works.
# 1. Build a double-clickable app bundle (release + sign) scripts/bundle-macos.sh # 2. Store your Deepgram API key (one time) ./Holler.app/Contents/MacOS/holler set-key deepgram <YOUR_KEY> # 3. Launch it (menubar agent — no Dock icon) open ./Holler.app
Grant Accessibility (to paste at the cursor) and Microphone on first launch, then hold ⌃⌥Space and talk. Windows builds ship as a self-contained ZIP — see the README.
Holler is actively developed and contributions are welcome. A few places we'd love a hand:
A Windows TTS backend + selection capture (read-aloud is macOS-only today).
An audio sink so cloud voices play on Windows/Linux, not just macOS.
A whisper-rs provider (large-v3-turbo, download-on-demand) — dictation with no network, no key.
An optional raw / cleaned / formatted pass behind an LlmProvider trait.
Audio, injection, and overlay backends for X11/Wayland.
Voices, settings UX, overlay layouts, and documentation are all fair game.