bakecut logo bakecut
HomeBlog › Tips
Tips

What Is Whisper? The Free AI Speech-to-Text Standard, Model Accuracy Compared

Jun 8, 2026 · bakecut
What Is Whisper? The Free AI Speech-to-Text Standard, Model Accuracy Compared

TL;DR: Whisper is a speech recognition AI that OpenAI released for free, and a large share of today's subtitle tools run on it. Pick Small for everyday videos, Medium when there's lots of technical vocabulary, and the Large family when you need maximum accuracy.

When you shop around for subtitle software, the phrase "powered by Whisper" keeps coming up. Once you know what it means, you'll have a much better eye for choosing tools.

What is Whisper?

Whisper is a speech recognition AI that OpenAI, the company behind ChatGPT, released for free in 2022. Trained on roughly 680,000 hours of audio, it can transcribe more than 90 languages, including English.

The key word is "released." Anyone can take it and use it, so a huge number of subtitle tools use Whisper as their internal engine. That's why transcription quality is often similar across different tools. The real differences come from the editing features and workflow, not the engine.

Model size: the real variable behind transcription quality

Whisper isn't one thing; it comes in several model sizes. Bigger means more accurate and slower:

Model Size Speed Accuracy Best for
Tiny ~75MB Very fast Low Rough drafts, low-spec machines
Base ~145MB Very fast Slightly low Short, clean audio
Small ~490MB Fast Solid The default for everyday videos
Medium ~1.5GB Slow Good Technical terms, fast talkers
Large (Turbo) ~1.6GB Medium Best Noisy audio, final cuts

If you've ever felt "auto-captions are inaccurate," the culprit is most likely a small model, not the tool. For accuracy factors beyond the model, see 5 ways to improve accuracy.

Cloud processing vs on-device processing

The same Whisper behaves differently depending on where it runs:

bakecut takes the on-device approach on Mac and Windows. You can choose between 5 Whisper models (tiny through large turbo), and your video is never uploaded.

Can I run it myself?

If you're a developer, you can run it directly for free. The common approach is using an accelerated build like faster-whisper in a Python environment. The catch is that the output is subtitle text (SRT), so to place it on a video and style it, you still need an editor. If the command line isn't your thing, a program with Whisper built in gets you there faster. For options, see our free subtitle software comparison.

How accurate is it in practice?

As a rule of thumb, clearly spoken speech in a quiet room reaches around 95% accuracy even with the Small model. Names, slang, and brand names still get mangled regularly, so the realistic workflow is "transcribe, then batch-fix proper nouns." Final typo cleanup is on you no matter which tool you use, which is why picking an editor that makes corrections easy matters.

Wrap-up: the easiest way to use Whisper

If you want Whisper's accuracy without touching a command line, bakecut is the fastest route. Pick from 5 models with a click, go from transcription to caption design to export in one program, and your video never leaves your computer. Available on Mac and Windows.

FAQ

Is Whisper really free?

Yes. OpenAI released both the code and the models, so anyone can use them for free. What costs money isn't Whisper itself but the services that run it on their servers for you.

Which Whisper model should I pick?

Start with Small for everyday videos. If typos bother you, move up to Medium; if there's a lot of noise or it's a final cut, go with the Large family.

Does it work without internet?

If you run it on your own computer, transcription works fully offline once the model is downloaded. Cloud-based tools require an internet connection.

Can Whisper do live captions?

It was designed for recorded audio, but there are projects using it in near real time. For YouTube video subtitles, though, transcribing after recording wins on accuracy.

Subtitles on your computer, no upload.
The AI subtitle editor that styles every single word

Get bakecut free →
Related postsAuto Silence Removal, How to Cut Your Video Length by 30%Hardcoded vs Soft Subtitles, Which Should You Export?Best Free Subtitle Software, Top 7 Compared (2026)