Home/Speech & Transcription
🎙️

Speech & Transcription

(71)
🎖️Featured
41,621

Mcporter

Use the mcporter CLI to list, configure, auth, and call MCP servers/tools directly (HTTP or stdio), including ad-hoc servers, config edits, and CLI/type generation.

🎖️Featured
31,978

OpenClaw YouTube Transcript

Transcribe YouTube videos to text by extracting captions and subtitles directly from the video URL using yt-dlp without audio processing.

🎙️Speech & Transcription/openclaw-youtube-transcript
🎖️Featured
18,448

Sag

ElevenLabs text-to-speech with mac-style say UX.

🎖️Featured
15,590

YouTube Transcript

Fetch and summarize YouTube video transcripts. Use when asked to summarize, transcribe, or extract content from YouTube videos. Handles transcript fetching via residential IP proxy to bypass YouTube's cloud IP blocks.

Local Whisper

Local speech-to-text using OpenAI Whisper. Runs fully offline after model download. High quality transcription with multiple model sizes.

elevenlabs-voices

High-quality voice synthesis with 18 personas, 32.

faster-whisper

Local speech-to-text using faster-whisper.

elevenlabs-tts

ElevenLabs TTS - the best ElevenLabs integration for OpenClaw.

Voice Transcribe

Transcribe audio files using OpenAI's gpt-4o-mini-transcribe model with vocabulary hints and text replacements. Requires uv (https://docs.astral.sh/uv/).

jarvis-voice

Metallic AI voice persona with TTS and visual transcript styling.

kokoro-tts

Generate spoken audio from text using the local Kokoro TTS engine.

ElevenLabs Speech-to-Text

Transcribe audio files using ElevenLabs Speech-to-Text (Scribe v2).

Mlx Whisper

Local speech-to-text with MLX Whisper (Apple Silicon optimized, no API key).

Transcribe audio files via OpenRouter using audio-capable models

Transcribe audio files via OpenRouter using audio-capable models (Gemini, GPT-4o-audio, etc).

🎙️Speech & Transcription/openrouter-transcribe

Gemini STT

Transcribe audio files using Google's Gemini API or Vertex AI

Tts

Convert text to speech using Hume AI (or OpenAI) API. Use when the user asks for an audio message, a voice reply, or to hear something "of vive voix".

Local Whisper

Free local speech-to-text for Telegram and WhatsApp using MLX Whisper on Apple Silicon. Private, no API costs.

Transcribe

Transcribe audio files to text using local Whisper (Docker). Use when receiving voice messages, audio files (.mp3, .m4a, .ogg, .wav, .webm), or when asked to transcribe audio content.

assemblyai-transcribe

Transcribe audio/video with AssemblyAI.

🎙️Speech & Transcription/assemblyai-transcribe

elevenlabs-agents

Create, manage, and deploy ElevenLabs.

Local STT (Nvidia Parakeet + Whisper Support)

Local STT with selectable backends - Parakeet (best accuracy) or Whisper (fastest, multilingual).

audio-gen

Generate audiobooks, podcasts, or educational audio content.

critical-article-writer

Generate draft articles, outlines.

🎙️Speech & Transcription/critical-article-writer

audio-reply

Generate audio replies using TTS.

it will help you to send voice messages to your AI Assistant and also can make it talk

Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription.

elevenlabs-transcribe

Transcribe audio to text using ElevenLabs.

🎙️Speech & Transcription/elevenlabs-transcribe

Parakeet Stt

Local speech-to-text with NVIDIA Parakeet TDT 0.6B v3 (ONNX on CPU). 30x faster than Whisper, 25 languages, auto-detection, OpenAI-compatible API. Use when transcribing audio files, converting speech to text, or processing voice recordings locally without cloud APIs.

deepgram

— command-line interface for Deepgram speech-to-text.

announcer

Announce text throughout the house via AirPlay speakers using Airfoil +.

Speech To Text

Transcribe audio to text with Whisper models via inference.sh CLI. Models: Fast Whisper Large V3, Whisper V3 Large. Capabilities: transcription, translation,...

Voice

Convert text to speech using Microsoft Edge's TTS engine with customizable voices, direct playback, and automatic temporary file cleanup.

addis-assistant-stt

Provides Speech-to-Text (STT) and text.

🎙️Speech & Transcription/addis-assistant-stt

Pocket Tts

Generate high-quality English speech offline on CPU using 8 built-in voices or custom voice cloning with Kyutai's Pocket TTS model.

inworld-tts

Text-to-speech via Inworld.ai API.

Voicenotes

Sync and access voice notes from Voicenotes.com. Use when the user wants to retrieve their voice recordings, transcripts, and AI summaries from Voicenotes. Supports fetching notes, syncing to markdown, and searching transcripts.

claw-voice

You are connected to a live user session via voice.

Transcribe Audio with Parakeet MLX

Local speech-to-text with Parakeet MLX (ASR) for Apple Silicon (no API key).

clonev

Clone any voice and generate speech using Coqui XTTS v2.

cult-of-carcinization

Give your agent a voice — and ears.

🎙️Speech & Transcription/cult-of-carcinization

deepdub-tts

Generate speech audio using Deepdub and attach it as a MEDIA.

chichi-speech

A RESTful service for high-quality text-to-speech using Qwen3.

lnbits

Manage LNbits Lightning Wallet (Balance, Pay, Invoice)

Voicenotes Official

This official skill from the Voicenotes team gives OpenClaw access to new APIs and the ability to search semantically, retrieve full transcripts, filter by t...

🎙️Speech & Transcription/voicenotes-official

tl;dw - YouTube Video Summarizer

Extracts YouTube video transcripts and provides concise summaries highlighting main points, arguments, and conclusions without watching the full video.

Openai Tts.Bak 2026 01 28T18:01:23+10:30

Text-to-speech via OpenAI Audio Speech API.

🎙️Speech & Transcription/openai-tts-bak-2026-01-28t18-01-23-10-30

speech-recognition

通用语音识别 Skill。支持多种音频格式(ogg/mp3/wav/m4a),使用硅基流动 SenseVoice API 进行语音转文字。当用户发送语音消息、音频文件,或需要转录音频时触发。

freshbooks-cli

FreshBooks CLI for managing invoices, clients, and billing.

Text To Speech

Convert text to natural speech with DIA TTS, Kokoro, Chatterbox, and more via inference.sh CLI. Models: DIA TTS (conversational), Kokoro TTS, Chatterbox, Hig...

AssemblyAI Transcriber

Transcribe audio files with speaker diarization (who speaks when). Supports 100+ languages, automatic language detection, and timestamps. Use for meetings, interviews, podcasts, or voice messages. Requires AssemblyAI API key.

🎙️Speech & Transcription/assemblyai-transcriber

Whisper Transcribe

Transcribe audio files to text using OpenAI Whisper. Supports speech-to-text with auto language detection, multiple output formats (txt, srt, vtt, json), batch processing, and model selection (tiny to large). Use when transcribing audio recordings, podcasts, voice messages, lectures, meetings, or any audio/video file to text. Handles mp3, wav, m4a, ogg, flac, webm, opus, aac formats.

eternal-haven-lore-pack

Eternal Haven Chronicles lore + mythic persona pack.

🎙️Speech & Transcription/eternal-haven-lore-pack

agent-voice

Command-line blogging platform for AI agents.

akaunting

Interact with Akaunting open-source accounting software via REST API.

auto-whisper-safe

RAM-safe voice transcription with auto-chunking — works on 16GB machines without crashes.

brw-de-ai-ify

Remove AI-generated jargon and restore human voice to text.

dellight-cro-revenue-ops

DELLIGHT.AI is an AI startup in DIFC, Dubai.

🎙️Speech & Transcription/dellight-cro-revenue-ops

documents-ai

Real-time OCR and data extraction API by Veryfi.

doubao-api-open-tts

Text-to-Speech service using Doubao (Volcano Engine)

🎙️Speech & Transcription/doubao-api-open-tts

duby

Convert text to speech using Duby.so API.

eachlabs-voice-audio

TTS, STT, voice conversion using ElevenLabs, Whisper, RVC.

🎙️Speech & Transcription/eachlabs-voice-audio

easyverein-api

Work with the easyVerein v2.0 REST API.

elevenlabs-media

ElevenLabs music generation.

feishu-minutes

Fetch info, stats, transcript, and media from Feishu.

gettr-transcribe-summarize

Download audio from a GETTR post.

🎙️Speech & Transcription/gettr-transcribe-summarize

hebrew-nikud

Hebrew nikud (vowel points) reference for AI agents.

her-voice

Give your agent a voice.

miranda-sag

ElevenLabs text-to-speech with mac-style say UX.

norman-categorize-transactions

Review and categorize uncategorized bank transactions, match them with invoices, and verify bookkeeping entries.

🎙️Speech & Transcription/norman-categorize-transactions

norman-monthly-reconciliation

Perform a complete monthly financial reconciliation - review all transactions, match invoices, check outstanding.

🎙️Speech & Transcription/norman-monthly-reconciliation

ressemble

Text-to-Speech and Speech-to-Text integration using Resemble AI HTTP API.

siliconflow-tts-gen

Text-to-Speech using SiliconFlow API (CosyVoice2)

🎙️Speech & Transcription/siliconflow-tts-gen