← All posts
June 20, 2026·9 min read

Generate subtitle files from any recording in seconds — offline, on your iPhone

Kuulo exports SRT and VTT subtitle files from any recording, entirely on-device. No upload, no per-minute charge, no internet required. For media students, podcasters, streamers, YouTubers, and anyone producing captioned content — the fastest subtitle workflow that keeps your audio private.

Key takeaways
  • Kuulo generates SRT and VTT subtitle files from any recording, offline, on-device — ready to import into any video editor or hosting platform.
  • Cloud captioning services (Rev.com, Otter export, Descript) send audio to servers and charge per-minute or subscription. On-device subtitle generation has no marginal cost.
  • Pre-release, confidential, or sensitive content can be captioned without the audio leaving the device — relevant for journalists, filmmakers, clinical educators, and corporate content teams.
  • The WCAG 2.1 accessibility case and the creator workflow case are the same tool: both need accurate captions, fast, without an upload.

Captions used to be something you added after the fact — a paid service, a manual process, a step that got skipped because it took too long. A video went live without them because captioning it would cost more time or money than felt justified.

On-device AI changes that calculation. Record your content, or import the audio file, and Kuulo generates an SRT or VTT subtitle file in seconds. No upload, no cloud service, no per-minute charge. The file is ready before you've finished exporting the video.

This is new. And it matters more to more people than it might initially seem.

Who needs subtitle files

The obvious answer is content creators — YouTubers, podcasters, streamers, TikTok creators. That's a large group. As of 2024, YouTube hosts over 800 million videos, Spotify hosts over 5 million active podcasts, and Twitch sees millions of live streams weekly. The majority of this content has inadequate or no captions.

But the need extends well beyond professional creators:

Media and communications students producing coursework video — short documentaries, interviews, broadcast packages, social media productions — are increasingly assessed on accessibility compliance. Universities in the UK and US require WCAG 2.1 Level AA compliance for published content, which mandates captions for pre-recorded audio. A student producing a 15-minute documentary for a final portfolio needs accurate subtitles. Kuulo generates them from a recording in the time it takes to make a coffee.

Conference and event organisers producing recorded panel sessions, keynote recordings, or webinar replays. Adding captions to these typically requires sending the recording out to a service. With Kuulo, the subtitle file is generated locally from the recording, ready to upload alongside the video.

Language learners using captioned video as a comprehension tool. Kuulo's live translation feature means you can record in one language, get a translated transcript, and generate subtitles in a second language — all on-device. For language learning content, self-produced immersion material, or foreign-language classroom recordings, this opens a workflow that previously required multiple tools and an internet connection.

Corporate L&D and training teams producing internal video content — onboarding recordings, process walkthroughs, recorded training sessions — that require subtitles for accessibility, often under legal obligations for employees with hearing impairments.

What subtitle file formats actually are

An SRT (SubRip Subtitle) file is a plain text file with a specific structure: a sequential number, a timestamp showing when the subtitle appears and disappears, and the subtitle text itself. It looks like this:

1
00:00:03,400 --> 00:00:06,700
The budget discussion ran through three distinct phases.

2
00:00:06,800 --> 00:00:09,200
First, the allocation question.

VTT (WebVTT) is the web-standard format, used for HTML5 video, YouTube, and most modern platforms. Both formats are widely accepted by video editing software (Premiere, Final Cut, DaVinci Resolve), video hosting platforms (YouTube, Vimeo, LinkedIn Video), and podcast transcript services.

Kuulo generates both. The file can be exported directly and imported into any standard editing or publishing workflow.

The cost comparison

Professional human captioning services charge approximately $1.00–$2.00 per minute in the US market (Rev.com, 3Play Media, Verbit). For a 30-minute podcast episode, that's $30–$60 per episode. For a weekly podcast, $1,500–$3,000 per year.

Automated cloud captioning (Otter.ai's export, Descript, Adobe Premiere's Speech to Text, Kapwing) is cheaper but requires an internet connection, sends your audio to a server, and typically charges either per-minute or as part of a monthly subscription.

Kuulo generates subtitle files from recordings already on your device, offline, as part of the note-generation workflow. The marginal cost of adding subtitle export to a recording you've already made is nothing.

For media students and independent creators working with tight budgets, this is a real change. For content production teams generating hundreds of minutes of captioned content per month, it changes the unit economics of an accessibility workflow.

Why "offline" matters for subtitle generation

Most automated captioning tools require your audio file to travel to a server. For the majority of use cases — YouTube videos, podcast episodes, public educational content — this is fine. The audio will be public anyway.

For a number of content scenarios, it isn't fine:

Pre-release content. A podcast episode, documentary, or video that hasn't been published yet. Sending pre-release audio to a third-party captioning service means the content is on a server before you've decided who gets access to it. For content under embargo, this is a leak risk.

Sensitive interview content. A documentary filmmaker who has recorded interviews with individuals under confidentiality agreements. A journalist producing video journalism from protected source conversations. The audio cannot be sent to a third party for captioning without violating the commitment.

Corporate and internal content. A company producing internal training videos, town hall recordings, or onboarding content that covers strategy, personnel, or commercially sensitive information. Internal content regularly gets captioned by external services without the content owners thinking carefully about it.

Medical and therapeutic video. Some clinicians and educators produce video content covering patient scenarios, training material, or clinical demonstrations. On-device captioning keeps this content within the appropriate data boundary.

Kuulo's offline subtitle generation is the only path that handles all of these scenarios: no upload, no cloud processing, subtitle file available immediately.

Accuracy and the editing step

Automated captioning — cloud or on-device — produces accurate transcripts for clear speech in normal acoustic conditions. Independent benchmarks consistently show Word Error Rates below 10% for modern speech recognition models on clean audio, often below 5% for native speaker English with a good microphone.

What this means practically: a 30-minute podcast episode might have 15–25 words that need correcting. With Kuulo's transcript editor, corrections are fast — click the word, type the correction, done. The final SRT is exported after review.

For broadcast-quality content where every word matters — news, documentary, commercial production — the subtitle export from Kuulo is a starting point that dramatically reduces the time required compared to manual captioning from scratch, even if a full accuracy review is done before publishing.

For classroom projects, internal content, and social media creation where the standard is "good enough to be useful," the raw export is typically publishable without editing.

Workflow: from recording to subtitle file

The end-to-end is straightforward:

  1. Record or import. Record live audio through Kuulo, or import a video file's audio. Both paths work offline.
  2. Transcription. On-device transcription runs in real time for live recording, or fast-processes an imported file. A 30-minute recording takes roughly 2–4 minutes to process on current iPhone hardware.
  3. Review. The transcript editor shows the full text with timestamps. Correct any errors.
  4. Export. Choose SRT or VTT. The file exports to your device's Files app, ready to share, AirDrop, import into a video editor, or upload to a hosting platform.

The whole process from a completed recording to a ready subtitle file is typically under 10 minutes for a standard-length episode or lecture. With a clean recording, often under 5.

What this means for accessibility at scale

The Web Content Accessibility Guidelines (WCAG) 2.1 require captions for all pre-recorded audio and video content published by organisations covered under the Equality Act 2010 (UK) or the Americans with Disabilities Act (US). A 2023 survey by AbilityNet found that 73% of UK university websites failed at least one Level AA WCAG criterion. Captioned video is one of the most consistently missed.

For the content creators, students, and organisations producing video content at scale, the bottleneck has historically been the cost and effort of captioning. On-device automated subtitle generation removes the cost entirely and reduces the effort to minutes.

The accessibility case and the creator workflow case are the same tool serving different motivations. A media student who needs WCAG-compliant subtitles for their final project portfolio and a podcaster who wants searchable episode transcripts for SEO are both solved by the same export. The fact that neither of them has to send their audio anywhere to get there is either important or irrelevant depending on their content. But the option is there either way.

Kuulo is the first on-device AI notetaker to offer full SRT and VTT subtitle export. The capability exists because the underlying transcription pipeline is entirely local — there's no architectural reason to limit the output to a formatted summary rather than a timestamped subtitle file. The transcript was always there. The export just makes it portable.

Frequently asked questions

Can I generate subtitle files offline on iPhone?

Yes. Kuulo transcribes on-device and exports the timed transcript as an SRT or VTT file — ready to import into Final Cut Pro, DaVinci Resolve, Premiere, YouTube, or any platform that accepts standard subtitle formats. No internet required at any stage.

How accurate is on-device subtitle generation?

For clear speech with a decent microphone, Word Error Rates below 5–10% are typical for modern on-device Whisper-based models. A 30-minute recording may need 15–25 word corrections in the subtitle editor before publishing. For accessibility use and most creator workflows, the raw export is often publishable without editing.

How much does it cost to generate subtitles with Kuulo vs Rev or Otter?

Rev.com charges $1.00–2.00 per minute for automated captioning. Otter's subtitle export is included in paid plans ($16.99/month). Kuulo generates subtitle files from recordings already on your device at no additional cost — no per-minute charge, no subscription required for the export.

What subtitle formats does Kuulo export?

Kuulo exports SRT (SubRip Subtitle) and VTT (WebVTT) formats. Both are accepted by major video editing applications (Premiere, Final Cut, DaVinci Resolve) and video hosting platforms (YouTube, Vimeo, LinkedIn Video). SRT is the most universally compatible; VTT is the web standard for HTML5 video.

Try Kuulo

On-device AI notes, private by design. Free for iPhone and Mac.

Get the app