Offline live translation: how on-device AI handles multilingual conversations without internet
Most translation apps require internet and send your audio to a server. On-device live translation works without signal, keeps audio private, and integrates directly with note-taking — relevant for clinical multilingual consultations, research fieldwork, and international business.
- Most live translation tools require internet and send audio to cloud servers — problematic in offline environments and for sensitive conversations.
- On-device live translation on Apple Silicon Neural Engine works without signal, keeping all audio on the device.
- The clinical use case: patient audio translated on-device has not been transmitted for processing — categorically different from using Google Translate in a consultation.
- Kuulo's live translation is integrated with transcription and note generation — the translated content becomes part of the structured note, not a separate app output.
The problem with most translation apps is that they require internet. This is fine when you're standing outside a restaurant in Rome trying to read the menu. It is not fine when you're conducting a multilingual clinical consultation in a basement ward with no signal, interviewing a participant in a rural fieldwork location, or sitting in a diplomatic briefing where the wifi password is not forthcoming.
The problem with cloud-based real-time translation is different: it means sending the audio of your conversation to a server. For a conversation that contains protected health information, confidential business discussions, or a research participant's private disclosure, the translation infrastructure becomes a data governance problem.
On-device live translation solves both problems. It works without internet. It keeps audio on the device.
What live translation actually means
Live translation refers to the conversion of spoken audio in one language into text (or speech) in another language, in real time. This is distinct from:
Post-hoc translation — transcribing a recording in one language and then translating the transcript. This is accurate but delayed. You don't get the translated content during the conversation; you get it afterwards.
Document translation — translating written text. No audio involved.
Consecutive interpretation — a human interpreter listens to a portion of speech, then translates it from memory. This is how professional diplomatic interpretation works; it requires a skilled practitioner and produces significant additional time in the conversation.
Live translation is the attempt to do what consecutive interpretation does, in real time, automatically, without requiring a human interpreter for every bilingual interaction.
How on-device live translation works
Translation is harder than transcription. Transcription maps sound to text in the same language; translation maps sound to text in a different language, requiring semantic understanding, grammatical transformation, and vocabulary selection that simple acoustic mapping doesn't require.
The recent development that makes on-device live translation possible is the convergence of transformer-based translation models with mobile hardware capable of running them. Apple Silicon's Neural Engine — the dedicated machine learning accelerator in A-series iPhones and M-series Macs — provides matrix multiplication throughput sufficient to run real-time multilingual translation without server-side compute.
Kuulo's live translation runs on-device, in real time, processing incoming audio and producing translated text continuously. The source audio never leaves the device. The translation happens locally.
Apple vs Google: the current landscape
Apple Translate (built into iOS/macOS): supports 20 languages, works offline for downloaded language pairs, produces clean translated text from spoken input. The offline capability is genuine and works well for supported language pairs. Limitations: fixed at 20 languages, no integration with note-taking or transcription workflows, outputs to the Translate app only.
Google Translate: supports 133 languages, including the Conversation mode which handles two-way live translation. The language breadth is substantially wider than Apple's offering. Cloud-dependent for most real-time translation (offline download packs exist for some languages but with reduced accuracy). The Conversation mode works well for bilateral exchanges; the offline mode is more limited.
Kuulo: on-device live translation integrated with transcription and note generation. The translated text is part of the same note — you receive a transcription in the source language and a translation simultaneously, structured within the note format (lecture summary, meeting minutes, SOAP note). The integration is the differentiator: translation that feeds directly into the note you're generating, not a separate app output.
For pure bilateral translation between two supported languages with internet, Google Translate's Conversation mode is excellent and requires no other tool. For translation integrated into a note-taking workflow, with on-device privacy, the question is different.
Clinical multilingual consultations
NHS clinical settings see patients speaking dozens of languages. Professional medical interpretation is the standard — NICE guidance is clear that ad hoc family interpretation is inappropriate for complex or sensitive consultations. Professional interpreters are expensive, not always available, and require booking in advance. For complex bilingual populations, the gap between guideline and practice is significant.
An on-device live translation tool does not replace a professional interpreter for high-stakes clinical decisions. It does provide a tool for the situations that fall short of full interpretation: the brief exchange to establish presenting complaint before the interpreter arrives, the follow-up consultation where the main issue has already been established, the situation where the patient speaks some English but not medical English.
The GDPR position is clear: patient audio that is translated on-device has not been transmitted for processing. No Article 9 health data has left the clinician's device. This is categorically different from using Google Translate for a clinical conversation — which sends audio to Google's servers and creates a cloud data trail of a clinical exchange.
For clinicians working in areas with significant multilingual populations, on-device live translation is a tool for the situations where full professional interpretation is not available or not the appropriate level of resource — while maintaining the privacy architecture that clinical audio requires.
Research interviews across language barriers
Qualitative researchers frequently conduct interviews in languages that are not their primary language — fieldwork in a participant's country, diaspora research with communities in UK cities, comparative research requiring data collection in multiple national contexts.
Current practice for multilingual research interviews typically involves: conducting the interview in the shared language (limiting participant expression), using a professional interpreter (expensive and adding an additional relationship to a sensitive dynamic), or recording in the participant's language and having the transcript translated post-hoc (accurate but delayed and costly).
On-device live translation provides a fourth option: the researcher follows the interview in real time in translation, the participant speaks in their preferred language, and the resulting note contains both the transcript and the translation. The interview is not mediated by a second person. The participant speaks naturally. The researcher has context sufficient to probe during the interview, and a complete translated record afterwards.
For research where the additional presence of an interpreter would change the dynamics of the disclosure — therapy research, trauma studies, sensitive social research — this matters.
Business and conference settings
International business meetings, client calls with overseas partners, conference presentations in languages that attendees don't speak — all of these create translation needs that fall outside what Google Translate's casual bilateral mode handles well.
A structured business meeting note that includes the translated content of a French partner's contribution, integrated alongside the English contributions and attributed to the French speaker by diarization, is a different quality of record than a note that says "Louis spoke (in French)" and relies on whoever in the room understood French to summarise.
For businesses operating across language boundaries — an increasingly common reality for both large enterprises and smaller companies with international clients — on-device live translation that produces a structured, multilingual, attributed meeting note is a qualitatively different capability from a translation app opened in a separate window.
Two-way conversation translation
The bilateral translation use case — two people, two languages, a live conversation — is different from the unilateral use case of following a presentation or lecture in another language.
Kuulo's live translation handles both. For a two-way conversation, both participants speak into the same device, and the translated output appears for both participants. This is functionally similar to Google Translate's Conversation mode, with the key difference that audio doesn't leave the device.
The two-way use case is where the privacy argument is most acute: a conversation, by definition, involves two people's voices, two people's words, and potentially two people's private disclosures. On-device processing means neither participant's voice is transmitted to a server during the exchange.
The offline requirement
Live translation typically fails where internet fails. The scenarios where translation is most needed in professional contexts are often exactly the ones where internet is most unreliable.
A ward basement with poor signal. A rural fieldwork location. An international conference centre where the wifi is overloaded. A consultation room in a building where mobile data doesn't penetrate.
On-device live translation works in all of these. The Neural Engine doesn't need a connection. The translation runs regardless of signal quality. The contexts that cloud-based translation can't reliably reach are exactly the contexts where Kuulo's on-device architecture matters most.
Frequently asked questions
Does live translation work without internet?
Kuulo's live translation runs on the Neural Engine of an Apple Silicon iPhone — no internet connection required. Translation happens on-device in real time, regardless of signal quality.
Can I use an AI translator in clinical settings?
On-device translation like Kuulo's keeps patient audio on the clinician's device — no audio is transmitted for processing, which is categorically different from cloud translation services. On-device translation is appropriate for supplementary use alongside professional medical interpretation; it does not replace professional interpretation for high-stakes clinical decisions.
What's the difference between Apple Translate and Kuulo for live translation?
Apple Translate supports 20 languages offline and is excellent for bilateral translation. Kuulo's live translation is integrated with note-taking — the translated content becomes part of the meeting minutes, lecture summary, or clinical note, rather than an isolated translation output.
How accurate is on-device live translation?
On-device translation accuracy on current iPhone hardware is strong for major language pairs and in quiet environments. Cloud translation services (Google, DeepL) have accuracy advantages for some language pairs and noisier conditions. For contexts where audio cannot be sent to a server — clinical, legal, research — on-device translation is the accurate and architecturally sound option.