Google is back with yet another AI service—this time, an offline dictation program using its “Gemma” architecture. But rather than include it within the Gemini app, or as a Gemini function, the company has decided to roll it out into a dedicated iPhone app, with the very catchy name of “Google AI Edge Eloquent.”
I decided to give the app a shot on release day, though the privacy policy gave me pause. Google says that your location, contacts, identifiers, device diagnostics, contact info, user content, usage data, and “other” data can be linked to you, while purchases and other diagnostics can be collected but not linked to you. That’s a lot of data, especially for an app that advertises that “audio, confidential conversations, and personal data never leave your device,” and I’m not sure I’d be keen on downloading the app otherwise. But, as the saying goes, if a service is free, you are the product. I’ve reached out to Google for clarification here, and will update this story if I hear back.
How to try Google’s new AI transcription app
Once you download the app, setup is easy—you record a sample example phrase the app tells you to say, then make a choice: “On-device mode,” which is fully offline, and stores your conversations on your device online; or “Enhanced text polishing,” which keeps the audio on your device, but does use Gemini to “polish” your text, which requires you to send data to the cloud (and is presumably where all that aforementioned privacy policy data is going). You won’t need to keep Gemini on for the app to do a basic edit of your transcript though—by design, the app removes “filler” words like “um.” Keep in mind that the app seems to open in “Enhanced text polishing” mode by default—at least, that’s how it worked on my end. But a simple tap of a toggle in the top-right corner of the main screen switches you into “On-device mode.”
I had some trouble getting the app up and running: Every time I tried to test it, it claimed I didn’t speak at all. But after pairing AirPods with my iPhone and unpairing them, the app seemed to work. To test the app, I played the intro of this Audio University YouTube video, which is entirely dialogue-based. Once the app was working, it immediately started transcribing the video, with near perfect accuracy—at least by the end. I would watch the app enter incorrect words, then retract and replace them as subsequent words provided context. Once the recording was finished, the transcript was nearly identical to the video’s transcript, save for a couple quirks: It mistakenly thought “If this is our first time meeting” was “This is our first time meeting,” and recorded a single sentence twice. But other than that, this is a totally usable transcript of the beginning of the video.
What do you think so far?