Dragon ID’s mobile unlock by voice

A brief but interesting story on GigaOm.

Nuance, the company behind the Dragon family of voice recognition products, is promoting a mobile app called Dragon ID.  The app acts as a replacement for standard user authorization schemes like PINs or swipe patterns by matching speech characteristics of a user against a known set of characteristics.  It’s the old “My voice is my passport” idea that we (or at least I) saw in “Sneakers“; the user speaks a phrase into the device, the device checks to see if the user’s speech has the same x, y, and z as the real Mr. User, and accepts or rejects the attempt.

At first blush this looks like a UX winner.  The user doesn’t have to remember any complicated passwords, PINs, or other meaningless tokens.  And it would be impossible for the user to lose his authenticator, his voice, except by disease or injury.

But there are some security considerations that must be satisfied for this to be an acceptable gatekeeper for a mobile device.  The most obvious weakness of this system would be to a replay attack, literally replaying a recording of the user authenticating.  What countermeasures are used by Dragon ID to prevent such simple attacks?  Presumably the audio recording is analyzed by Dragon ID to ensure that the voice is coming from a point directly in front of the device or headset microphone, but this would not be a robust defense.  Can it detect artifacts of digital audio reproduction?  Audio compression schemes like MP3?  Does it emit a one-time audio watermark via the speaker during recording so that a replay would be easily detected?  I’d certainly love to know.

Pattern matching is performed against an established set of phrases recorded by the user.  This simplifies the task of matching a candidate audio sample’s characteristics against a known set of characteristics, but it presumably reduces the amount of work an attacker would need to put into making a passable authenticator.  In a perfect world, the app would compose a unique phrase for each attempted authentication, each log-in, so that an attacker would have no real template for a “good guess”.  The attacker would need to know about a user’s full range of accents, inflections, cadences, etc., in order to make a passable authenticator, and he would only get one shot at each phrase.  With a known subset of authenticators (like a decent recording of one successful authentication attempt), the attacker knows what the phrase will be for any future attempt and that he will only have to polish it somehow for it to be acceptable.

Phrases can be disabled by the user or disallowed by the device or the Dragon ID servers for too many failed attempts, but this raises a question about the resistance of the app to multiple attempts.  The app surely only allows a certain number of attempts before either locking the device entirely, disabling a specific phrase, or forcing the user to authenticate with a password or some other non-voice token.  But how does it track multiple attempts? The app is required to work even when the device is completely disconnected from voice or data networks, so there must be some form of device-resident logging.  If the device’s memory is cloned before an attack, what prevents the attacker from reflashing the device into its previous state where the counter was at 0?  There are plenty of memory locations on a device to store counter information, and more clever ways than a simple variable in a LoginAttempts.dat file.  Is it possible to completely reset the state of the device to a set point such that an attacker could indefinitely attempt authentication?

Enlighten me.  I love this stuff.


Original article on GigaOm.