Introduction
Cohere has released Transcribe, an open-source automatic speech recognition (ASR) model that immediately claimed the top spot on the Hugging Face Open ASR Leaderboard. With an average word error rate (WER) of just 5.42%, Transcribe outperforms industry leaders including OpenAI Whisper Large v3 (7.44%) and ElevenLabs Scribe v2 (5.83%).
Key Features
Industry-Leading Accuracy
Transcribe achieves the lowest WER on Hugging Face’s leaderboard:
- Average WER: 5.42% (vs Whisper Large v3’s 7.44%)
- Outperforms Whisper on all tested languages
- 64% preference rate in human evaluations vs Whisper Large v3
Multi-Language Support
Supports 14 major languages including:
- English, French, Chinese, Arabic, Japanese
- Plus 10 additional languages
- Consistent accuracy across language pairs
Open-Source Freedom
Licensed under Apache 2.0:
- Free for commercial use
- No attribution required
- Can be modified and distributed
- Available on Hugging Face
Performance Comparison
| Model | Avg WER | Languages | License |
|---|---|---|---|
| Cohere Transcribe | 5.42% | 14 | Apache 2.0 |
| OpenAI Whisper Large v3 | 7.44% | 100+ | MIT |
| ElevenLabs Scribe v2 | 5.83% | 99 | Proprietary |
| Qwen3-ASR-1.7B | 5.76% | 8 | Apache 2.0 |
Pricing
| Option | Cost | Best For |
|---|---|---|
| Hugging Face Free | Free | Testing and development |
| Cohere API | Pay-per-use | Production apps |
| Self-hosted | Hardware only | Enterprise control |
Pros
- Best-in-class accuracy among open models
- Fully open-source with Apache 2.0 license
- Fast inference optimized for real-time use
- Enterprise-ready with production API
- Strong multilingual performance
Cons
- Limited language coverage (14 vs Whisper’s 100+)
- Requires technical setup for self-hosting
- API costs for production usage
Who Should Use It
Transcribe is ideal for:
- Voice assistant developers needing high accuracy
- Call center analytics platforms
- Accessibility tool builders
- Enterprises requiring on-premise ASR
- Researchers studying speech recognition
Conclusion
Cohere Transcribe represents a significant leap in open-source speech recognition. By achieving 5.42% WER, it surpasses proprietary solutions while maintaining the flexibility of Apache 2.0 licensing.
Rating: 4.7/5