Home/Speech & Transcription

🎙️

Speech & Transcription

(71)

🎖️Featured

41,621

Mcporter

Use the mcporter CLI to list, configure, auth, and call MCP servers/tools directly (HTTP or stdio), including ad-hoc servers, config edits, and CLI/type generation.

🎙️Speech & Transcription/mcporter

🎖️Featured

31,978

OpenClaw YouTube Transcript

Transcribe YouTube videos to text by extracting captions and subtitles directly from the video URL using yt-dlp without audio processing.

🎙️Speech & Transcription/openclaw-youtube-transcript

🎖️Featured

18,448

Sag

ElevenLabs text-to-speech with mac-style say UX.

🎙️Speech & Transcription/sag

🎖️Featured

15,590

YouTube Transcript

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

🎙️Speech & Transcription/youtube-transcript

Local Whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

🎙️Speech & Transcription/local-whisper

elevenlabs-voices

High-quality voice synthesis with 18 personas, 32.

🎙️Speech & Transcription/elevenlabs-voices

faster-whisper

Local speech-to-text using faster-whisper.

🎙️Speech & Transcription/faster-whisper

elevenlabs-tts

ElevenLabs TTS - the best ElevenLabs integration for OpenClaw.

🎙️Speech & Transcription/elevenlabs-tts

Voice Transcribe

Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).

🎙️Speech & Transcription/voice-transcribe

jarvis-voice

Metallic AI voice persona with TTS and visual transcript styling.

🎙️Speech & Transcription/jarvis-voice

kokoro-tts

Generate spoken audio from text using the local Kokoro TTS engine.

🎙️Speech & Transcription/kokoro-tts

ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

🎙️Speech & Transcription/elevenlabs-stt

Mlx Whisper

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

🎙️Speech & Transcription/mlx-whisper

Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).

🎙️Speech & Transcription/openrouter-transcribe

Gemini STT

Transcribe audio files using Google's Gemini API or Vertex AI

🎙️Speech & Transcription/gemini-stt

Tts

Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".

🎙️Speech & Transcription/tts

Local Whisper

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

🎙️Speech & Transcription/whisper-mlx-local

Transcribe

Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.

🎙️Speech & Transcription/transcribe

assemblyai-transcribe

Transcribe audio/video with AssemblyAI.

🎙️Speech & Transcription/assemblyai-transcribe

elevenlabs-agents

Create, manage, and deploy ElevenLabs.

🎙️Speech & Transcription/elevenlabs-agents

Local STT (Nvidia Parakeet + Whisper Support)

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

🎙️Speech & Transcription/local-stt

audio-gen

Generate audiobooks, podcasts, or educational audio content.

🎙️Speech & Transcription/audio-gen

critical-article-writer

Generate draft articles, outlines.

🎙️Speech & Transcription/critical-article-writer

audio-reply

Generate audio replies using TTS.

🎙️Speech & Transcription/audio-reply-skill

it will help you to send voice messages to your AI Assistant and also can make it talk

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

🎙️Speech & Transcription/elevenlabs-voice

elevenlabs-transcribe

Transcribe audio to text using ElevenLabs.

🎙️Speech & Transcription/elevenlabs-transcribe

Parakeet Stt

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

🎙️Speech & Transcription/parakeet-stt

deepgram

— command-line interface for Deepgram speech-to-text.

🎙️Speech & Transcription/deepgram

announcer

Announce text throughout the house via AirPlay speakers using Airfoil +.

🎙️Speech & Transcription/announcer

Speech To Text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

🎙️Speech & Transcription/speech-to-text

Voice

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

🎙️Speech & Transcription/voice

addis-assistant-stt

Provides Speech-to-Text (STT) and text.

🎙️Speech & Transcription/addis-assistant-stt

Pocket Tts

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

🎙️Speech & Transcription/pocket-tts

inworld-tts

Text-to-speech via Inworld.ai API.

🎙️Speech & Transcription/inworld-tts

Voicenotes

Sync and access voice notes from Voicenotes.com. Use when the user wants to retrieve their voice recordings, transcripts, and AI summaries from Voicenotes. Supports fetching notes, syncing to markdown, and searching transcripts.

🎙️Speech & Transcription/voicenotes

claw-voice

You are connected to a live user session via voice.

🎙️Speech & Transcription/claw-voice

Transcribe Audio with Parakeet MLX

Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).

🎙️Speech & Transcription/parakeet-mlx

clonev

Clone any voice and generate speech using Coqui XTTS v2.

🎙️Speech & Transcription/clonev

cult-of-carcinization

Give your agent a voice — and ears.

🎙️Speech & Transcription/cult-of-carcinization

deepdub-tts

Generate speech audio using Deepdub and attach it as a MEDIA.

🎙️Speech & Transcription/deepdub-tts

chichi-speech

A RESTful service for high-quality text-to-speech using Qwen3.

🎙️Speech & Transcription/chichi-speech

lnbits

Manage LNbits Lightning Wallet (Balance, Pay, Invoice)

🎙️Speech & Transcription/lnbits

Voicenotes Official

This official skill from the Voicenotes team gives OpenClaw access to new APIs and the ability to search semantically, retrieve full transcripts, filter by t...

🎙️Speech & Transcription/voicenotes-official

tl;dw - YouTube Video Summarizer

Extracts YouTube video transcripts and provides concise summaries highlighting main points, arguments, and conclusions without watching the full video.

🎙️Speech & Transcription/tldw

Openai Tts.Bak 2026 01 28T18:01:23+10:30

Text-to-speech via OpenAI Audio Speech API.

🎙️Speech & Transcription/openai-tts-bak-2026-01-28t18-01-23-10-30

speech-recognition

通用语音识别 Skill。支持多种音频格式（ogg/mp3/wav/m4a），使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件，或需要转录音频时触发。

🎙️Speech & Transcription/speech-recognition

freshbooks-cli

FreshBooks CLI for managing invoices, clients, and billing.

🎙️Speech & Transcription/freshbooks-cli

Text To Speech

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...

🎙️Speech & Transcription/text-to-speech

AssemblyAI Transcriber

Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.

🎙️Speech & Transcription/assemblyai-transcriber

Whisper Transcribe

Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.

🎙️Speech & Transcription/whisper-transcribe