Documentation
Everything you need to know.
Learn how to configure Soundvibes, set up your ideal workflow, and troubleshoot common issues.
Quick Start
Get up and running with just two commands. Soundvibes uses a daemon/client architecture where the daemon handles transcription and the client sends toggle commands.
Step 1
Start the daemon
sv daemon start Launches the background service and downloads models on first run.
Step 2
Toggle recording
sv Run this to start/stop capture. Bind it to a hotkey for easy access.
Frequently Asked Questions
Does it run fully offline?
Yes. After the first model download, everything stays local.
How do I start and stop capture?
Start the daemon with sv daemon start, then bind
sv to a hotkey and toggle on/off.
How do I install it?
Run the install script: curl -fsSL https://raw.githubusercontent.com/kejne/soundvibes/main/install.sh | sh. It handles everything automatically.
Installation
Automatic Install
The install script auto-detects your distribution, installs dependencies, downloads the binary, and sets up configuration.
curl -fsSL https://raw.githubusercontent.com/kejne/soundvibes/main/install.sh | sh System Requirements
- Linux x86_64
- Working microphone
- Vulkan libraries (optional)
- wtype (Wayland) or xdotool (X11) for injection
Runtime Dependencies (Pre-built Binary)
If you install via the install script or download the binary from GitHub Releases, you only need these runtime libraries:
Arch Linux
sudo pacman -Syu alsa-lib vulkan-icd-loader
Ubuntu/Debian
sudo apt-get install -y libasound2 libvulkan1 mesa-vulkan-drivers
Fedora
sudo dnf install -y alsa-lib vulkan-loader mesa-vulkan-drivers
GPU Drivers (Optional)
For GPU acceleration, also install GPU drivers. The above includes Mesa for AMD/Intel. For NVIDIA, install proprietary drivers with Vulkan support. See GPU Acceleration section for details.
Building from source? See CONTRIBUTING.md for development dependencies.
Setting Up Your Workflow
The power of Soundvibes comes from binding the sv command to a hotkey. Here are setup instructions for popular desktop environments.
GNOME / KDE / XFCE
Go to Settings → Keyboard → Custom Shortcuts. Add a new shortcut with:
- • Command:
sv - • Binding: Your preferred key (e.g., Ctrl+Alt+Space, F12)
i3 / Sway (Window Managers)
Add to your config file:
bindsym $mod+Shift+v exec sv
Hyprland
Add to hyprland.conf:
bind = $mainMod, V, exec, sv
Auto-start the Daemon
To start the daemon automatically on login, add to your desktop environment's startup applications:
sv daemon start
Or use systemd: systemctl --user enable --now soundvibes (if you ran the install script with service setup)
Configuration File
Soundvibes uses a TOML configuration file located at ~/.config/soundvibes/config.toml.
CLI flags override config file values, which override defaults.
Complete Example
# Model behavior
download_model = true # Allow auto-download on first run
model_size = "small" # tiny, base, small, medium, large, auto
# Transcription settings
language = "en" # Default active language context
model_variants = "en" # en, multilingual, both
device = "default" # Audio device name
audio_host = "alsa" # default, alsa
sample_rate = 16000 # Hz (16000 recommended)
# Output settings
format = "plain" # plain, jsonl
mode = "inject" # stdout, inject
# VAD (Voice Activity Detection) settings
vad = "on" # on, off (or true/false)
vad_silence_ms = 1200 # Silence timeout in milliseconds
vad_threshold = 0.01 # Energy threshold (0.001 - 0.1)
vad_chunk_ms = 100 # Chunk size in milliseconds
# Debug settings
debug_audio = false
debug_vad = false
dump_audio = false # Save captured audio to WAV
list_devices = false Model Variants
Soundvibes keeps model contexts per variant (English-only and multilingual), not per language code.
Use model_variants to preload
en, multilingual, or both.
If omitted, Soundvibes derives this from language
(English -> en, otherwise multilingual).
Use both to keep both variants warm for fast switching.
You can switch active language at runtime with
sv daemon set-language --lang <CODE> or
sv --toggle-language <CODE>.
Priority 1
CLI Flags
Highest priority, overrides everything
Priority 2
Config File
~/.config/soundvibes/config.toml
Priority 3
Defaults
Built-in fallback values
CLI Reference
Global Options
| Option | Default | Description |
|---|---|---|
| --language | en | Transcription language code |
| --toggle-language | - | Override language for a single toggle call |
| --download-model | true | Allow downloading missing models automatically |
| --model-size | small | Model size for all variants: tiny, base, small, medium, large, auto |
| --model-variants | derived | Preload variants: en, multilingual, both |
| --device | - | Audio input device name |
| --audio-host | alsa | default, alsa |
| --sample-rate | 16000 | Sample rate in Hz |
| --format | plain | Output format: plain, jsonl |
| --mode | inject | Output mode: stdout, inject |
| --vad | on | Voice Activity Detection: on, off |
| --vad-silence-ms | 1200 | VAD silence timeout (ms) |
| --vad-threshold | 0.010 | VAD energy threshold |
| --vad-chunk-ms | 100 | VAD chunk size (ms) |
| --list-devices | false | List available input devices |
| --debug-audio, --debug-vad, --dump-audio | false | Debug logging options |
Subcommands
sv daemon start
Start the background daemon process
sv daemon stop
Stop the daemon gracefully
sv daemon status
Show daemon state and active language
sv daemon set-language --lang <CODE>
Switch active language without toggling recording
sv (no arguments)
Send toggle command to daemon (start/stop recording)
Model Settings
Soundvibes uses Whisper models from HuggingFace. Models download automatically on first run, or you can download them manually.
Current Model Policy
Soundvibes uses one configured model size for all daemon language contexts.
Set model_size to choose tiny/base/small/medium/large,
then language-specific model variants are selected automatically and loaded from the local cache.
Model Language Variants
- multilingual: Uses `ggml-<size>.bin` for all non-English languages (and English fallback).
- en: Uses `ggml-<size>.en.bin` for English-only transcription.
- both: Preloads both variants so language switching has no first-use delay.
Model Storage
Models are stored in: ~/.local/share/soundvibes/models/
You can override the download URL with the SV_MODEL_BASE_URL environment variable for custom mirrors or offline setups.
Per-Language Model Contexts
The daemon keeps model contexts by variant, while active language remains a runtime setting.
Use model_variants = "both" to preload both contexts, then switch instantly with
sv daemon set-language --lang <CODE>.
Model variant selection is automatic per language key. English uses the English-optimized model variant, while other languages use multilingual variants.
Audio & VAD Configuration
Audio Devices
List available devices to find the correct name for your microphone:
sv --list-devices
Then set it in your config: device = "Your Device Name"
Voice Activity Detection (VAD)
VAD automatically trims silence from the end of recordings. Configure these settings to match your environment:
| Setting | Default | Description |
|---|---|---|
| vad_silence_ms | 1200 | How long to wait after speech stops before ending (ms) |
| vad_threshold | 0.010 | Energy threshold for detecting speech (0.001 - 0.1) |
| vad_chunk_ms | 100 | Audio chunk size for VAD analysis (ms) |
Quiet Environment
Lower threshold for better sensitivity:
vad_threshold = 0.005 Noisy Environment
Higher threshold to reduce false triggers:
vad_threshold = 0.02 Output Modes & Formats
Mode: inject
Type text at cursor (default)
Transcribed text is automatically typed at your cursor position. Uses wtype on Wayland or xdotool on X11.
Mode: stdout
Print to terminal
Output goes to standard output. Useful for piping to other commands or scripts.
Format: plain
Simple text output
Just the transcribed text, no extra formatting or metadata.
Format: jsonl
Structured JSON lines
Each utterance as JSON with type, text, timestamp, utterance, duration_ms fields.
Text Injection Requirements
For mode = "inject" to work, you need the appropriate tool for your display server:
- Wayland: Install
wtype(virtual keyboard) - X11: Install
xdotool(XTest extension)
Daemon Management
The daemon runs in the background, listening on a Unix socket for control commands. It handles audio capture, VAD processing, transcription, and active language state.
Socket Location
${XDG_RUNTIME_DIR}/soundvibes/sv.sock Usually resolves to /run/user/1000/soundvibes/sv.sock
Lifecycle Commands
sv daemon start Launch daemon sv daemon stop Graceful shutdown sv daemon status Current state + language sv daemon set-language --lang fr Switch active language Language-aware Hotkeys
You can keep your default toggle command and add dedicated per-language hotkeys.
sv sv --toggle-language fr sv --toggle-language sv Auto-start on Login
The install script can set up a systemd user service. Or add to your desktop environment's startup:
sv daemon start
GPU Acceleration
Soundvibes uses Vulkan for GPU acceleration, providing significant speedups for transcription. It automatically falls back to CPU if GPU is unavailable.
GPU Drivers
AMD: vulkan-radeon (Arch) or mesa-vulkan-drivers (Ubuntu/Fedora)
NVIDIA: nvidia-utils (Arch) or nvidia-driver with Vulkan support
Intel: vulkan-intel (limited support)
Verify Vulkan
vulkaninfo --summary
Should show your GPU in the device list
Performance Comparison
Lower is faster. GPU acceleration provides 3-5x speedup for larger models.
Environment Variables
| Variable | Purpose |
|---|---|
| SV_MODEL_PATH | Override default model path |
| SV_MODEL_BASE_URL | Custom model download mirror (e.g., for offline/airgapped setups) |
| XDG_CONFIG_HOME | Config directory (default: ~/.config) |
| XDG_DATA_HOME | Data directory for models (default: ~/.local/share) |
| XDG_RUNTIME_DIR | Runtime directory for socket (default: /run/user/UID) |
| SV_HARDWARE_TESTS, SV_GPU_TESTS, etc. | Test environment flags (see testing docs) |
Troubleshooting
Daemon Issues
"Connection refused" or "No such file"
The daemon isn't running. Start it with: sv daemon start
Daemon won't start
Check if another instance is running: ps aux | grep sv. Kill stale processes if needed.
Permission denied on socket
Check XDG_RUNTIME_DIR is set and writable. Usually /run/user/1000.
Model Issues
Model download fails
Check internet connection or set SV_MODEL_BASE_URL to a mirror. You can also download manually to ~/.local/share/soundvibes/models/
"Model file not found"
Enable auto-download with download_model = true, then restart the daemon so missing language models can be fetched
Exit Codes
Debugging
Debug Flags
Enable detailed logging to diagnose issues:
--debug-audio
Log audio capture details, device selection, and sample rates
--debug-vad
Log VAD decisions, energy levels, and trimming behavior
--dump-audio
Save captured audio to WAV file for inspection
Example Debug Session
# Start daemon with debug logging
sv daemon start --debug-audio --debug-vad
# In another terminal, toggle capture
sv
# Check logs in ~/.local/share/soundvibes/logs/ or console output Audio Problems
Device Not Found
List all available devices and verify your microphone is detected:
sv --list-devices
If your device isn't listed, check ALSA/PulseAudio configuration and ensure the device isn't muted.
No Audio Captured
- Check microphone isn't muted in your system mixer (alsamixer, pavucontrol)
- Verify the correct device is selected with
--deviceor config - Use
--dump-audioto verify audio is being captured - Check VAD threshold isn't too high for quiet speech
Poor Transcription Quality
- Check microphone quality and reduce background noise for cleaner input
- Ensure
languagematches your spoken language - Reduce background noise or use a better microphone
- Speak clearly and at a consistent volume
Text Injection Not Working
- Verify wtype (Wayland) or xdotool (X11) is installed
- On Wayland, ensure your compositor supports virtual keyboard protocols
- Some applications (especially browsers) may block synthetic input for security
- Use
--mode stdoutas a workaround
Still have questions? Check the technical documentation on GitHub.