Voice profiles: your phone remembers who said what, so you never have to label speakers again
Speaker diarization labels voices as Speaker 1, Speaker 2. Voice profiles go further — name a voice once, and every future recording with that person is automatically attributed. The only on-device AI notetaker with named voice profiles. No biometric data ever leaves your device.
- Voice profiles let you name a speaker once — Kuulo then recognises that voice in every future recording automatically, with no re-identification required.
- Cloud speaker recognition (AWS, Google, Azure) sends voice biometric data to external servers. Kuulo's voice profiles are stored and matched entirely on-device.
- Voice is GDPR Article 9 biometric data. On-device profiling means the people you record haven't had their voice features processed by a third-party server.
- Primary use cases: consultants with recurring clients, therapists with ongoing patients, journalists with source relationships, researchers with longitudinal interview cohorts.
Speaker diarization tells you that "Speaker 1" said the thing about the budget and "Speaker 2" pushed back on the timeline. That's useful. But it isn't the same as knowing Peter said it and Sarah pushed back.
Voice profiles take the next step. You tell Kuulo once that Speaker 1 is Peter. From that point forward, every recording that contains Peter's voice automatically labels his contributions with his name — not a speaker number. You never identify Peter again.
No other on-device AI notetaker does this. It requires storing a voice model, matching incoming audio against it in real time, and doing all of it without sending voice biometric data to any server. Until Apple Silicon made that compute available on a personal device, it wasn't architecturally possible to offer this privately.
The problem with speaker numbers
If you record ten meetings a month, a transcript that reads "Speaker 1: we need to push the deadline" is a detective puzzle. Which meeting? Which Speaker 1? Was this the same person as the Speaker 1 in the board meeting three weeks ago, or a different one?
Speaker numbers are ephemeral. They reset with every recording. They carry no identity across sessions. For any professional who records regularly — consultants running client programmes, researchers doing longitudinal interviews, doctors with recurring patients, journalists with ongoing source relationships — speaker numbers are accurate but not useful.
Voice profiles persist. Once you've named a voice, every future recording that includes it is labelled automatically. Your consultation notes for a patient you see every four weeks arrive pre-labelled with their name. Your board meeting transcript comes out with board member names attached to their contributions. Your six-session research interview series is already attributed correctly.
How voice profiles work
The technical process is called speaker verification or speaker recognition — distinguishing between speakers based on acoustic features of their voice. Research in this field goes back decades; the US National Institute of Standards and Technology has run Speaker Recognition Evaluation benchmarks since 1996, and the technology has improved substantially with deep learning approaches.
Voice profiles in Kuulo work as follows:
Enrolment. You record a short sample of a speaker's voice, or select a past recording and designate a segment as belonging to that person. Kuulo builds a voice model — a mathematical representation of the acoustic features of that voice — stored on your device.
Matching. When a new recording is processed, the diarization engine identifies voice segments and runs them against your stored voice profiles. A segment that matches Peter's stored model within the system's confidence threshold gets labelled with his name automatically.
Correction. Speaker recognition isn't perfect — accents, background noise, illness, and similar voices can cause misidentification. Kuulo's transcript editor lets you correct any misattribution, and corrections improve the profile over time.
On-device only. Voice biometric data — the stored acoustic model for each person you've profiled — never leaves your device. No voiceprint is uploaded. No server receives the matching query. This is the critical distinction from any cloud-based speaker recognition service.
Why on-device matters for voice biometrics specifically
Voice is biometric data. In the UK under the Data Protection Act 2018, in the EU under GDPR Article 9, and in California under CCPA/CPRA, biometric data is a special category requiring heightened protection. Uploading voice samples for processing — even for the limited purpose of speaker identification — involves transmitting biometric data to a third party.
Cloud speaker recognition services (AWS Transcribe Speaker Identification, Google Cloud Speaker Diarization, Azure Speaker Recognition) process voice features on remote servers. Your stored voice models live in their infrastructure. When a recording is matched against them, the audio goes out over the network.
For most consumer use cases, this is acceptable. For clinical environments, research settings, legal contexts, and any relationship where the speaker has not consented to voice biometric processing by a US technology company, it is not.
Kuulo's voice profiles are device-local. Peter's voice model is on your iPhone. Matching happens on the Neural Engine. No biometric data is transmitted. The people you've profiled did not implicitly consent to their voice features being stored on someone else's server — because they weren't.
The use cases where this changes everything
Consultants managing ongoing client relationships
A strategy engagement runs for six months. You meet the same people weekly. Without voice profiles, every meeting transcript requires manual re-identification of who said what. With profiles, the weekly meeting transcript arrives named. The attribution is there without effort.
More valuable: when you review notes from across the engagement three months in, searching for what a specific client stakeholder said about a particular issue, Kuulo can surface it by name. Not "find all mentions of this keyword spoken by Speaker 3 in six recordings." Just: "what did David say about the pricing model?"
Therapists and counsellors
A therapist seeing 20 clients weekly is effectively conducting 20 recurring recordings with the same speakers. Voice profiles mean that session notes for each client are attributed correctly without any manual effort — including the therapist's own voice as a consistent second speaker across all sessions.
The privacy architecture matters particularly here. Therapeutic clients have not consented to voice biometric processing by a third party. On-device voice profiles mean client voice data stays within the clinical relationship — precisely where GDPR Article 9 and therapeutic confidentiality obligations require it to be.
Journalists with ongoing source relationships
An investigative journalist working on a six-month story speaks to the same sources repeatedly. Voice profiles mean that months of recordings can be searched by source name — "what did the whistleblower say about the approval process in April?" — without manually cross-referencing speaker labels across dozens of recordings.
The source protection implications are direct. As covered in AI Transcription for Journalists, cloud transcription creates a server record of source voices. On-device voice profiles store voice models locally and match locally — the source's voice data never goes anywhere.
Medical professionals with recurring patients
A GP seeing the same patient across multiple consultations benefits from voice profiles in both directions. The patient's voice is pre-identified, and the GP's own voice is the consistent second speaker. Consultation notes arrive structured and attributed without manual identification. Patient audio stays on the device — GDPR Article 9 health data that was never transmitted.
Qualitative researchers with longitudinal studies
A researcher doing follow-up interviews with the same cohort over a year — as is common in longitudinal qualitative studies — can identify participants by voice profile across the entire study. This makes cross-session analysis tractable: query what a specific participant said at each interview stage without manually matching speaker numbers across recordings. As discussed in AI Transcription for Qualitative Researchers, research ethics boards increasingly scrutinise AI tool data flows. On-device voice profiles satisfy the strictest interpretation of data custody requirements.
What voice profiles don't do
Voice profiles are not a perfect identification system. They are a personal productivity tool for recognising voices you have deliberately enrolled. A few important limits:
They work for enrolled voices only. A new speaker in a recording is labelled "Speaker N" until you identify them. Profiles don't recognise anyone you haven't enrolled.
Accuracy degrades at the margins. Very similar voices, significant background noise, poor microphone placement, or a speaker with a cold can cause mismatches. The correction workflow in Kuulo's transcript editor is fast — tap a label, reassign it — and corrections improve the model.
They are not surveillance tools. Kuulo's voice profiles are enrolled by you, for people whose recordings you capture with their knowledge. They are a note attribution tool, not a voice identification system for unknown speakers.
They don't identify voice across devices. Your voice profiles exist on your device. They don't sync to a server (by design) and are not shared between users.
The competitive picture
No on-device AI notetaker currently offers voice profiles. The on-device tools — Whisper Notes, Aiko, VoiceScriber — do not offer diarization at all, let alone named voice profiles. (See Kuulo vs Whisper Notes for that comparison.)
Cloud tools including Otter.ai and some enterprise versions of Fireflies and Zoom AI Companion offer forms of speaker recognition, but all of them process voice features on cloud servers. For the ICPs where on-device matters — clinicians, lawyers, researchers, journalists — that disqualifies them.
Kuulo is the first app to offer named voice profiles with entirely on-device processing. The underlying capability — matching voice against a locally stored model using the Neural Engine — became possible with Apple Silicon at a quality level suitable for consistent professional use. It's now available to anyone who needs it, with the privacy guarantee that the architecture makes possible: the people in your recordings stay in your recordings, not in a server somewhere.
Frequently asked questions
Can an AI app remember and identify specific speakers across recordings?
Yes. Kuulo's voice profiles enrol a speaker's voice once — from a recording or a short sample — and then automatically identify that voice in future recordings. The result is named attribution rather than Speaker 1/Speaker 2.
Is voice recognition for meeting notes private?
With on-device processing, yes. Kuulo stores voice models on your device and performs matching on the Neural Engine — no voice biometric data is transmitted. Cloud speaker recognition services send voice features to their servers for matching.
How accurate is on-device voice profile matching?
Accuracy is high for enrolled speakers with clearly distinct voices in normal acoustic conditions. Mismatches occur with very similar voices, heavy background noise, or significant voice changes (illness, stress). Kuulo's transcript editor allows fast correction, and corrections improve the profile over time.
What's the difference between speaker diarization and voice profiles?
Diarization identifies that multiple distinct voices exist in a recording and labels them Speaker 1, Speaker 2, etc. — a new label set per recording. Voice profiles match voices against a named roster you've enrolled — so 'Peter' appears as Peter across all recordings, not as a different speaker number each time.