Real Artificial Intelligence Has Arrived: Voice Recognition

By Colin Hogan

Some of the most futuristic portrayals of technology have involved conversations between human and machine. Think of Hal 9000 from “2001: A Space Odyssey,” C-3PO or even Wall-E. All of these well-known characters had the ability to understand, interpret and produce language. This capability has long been considered one of the key hallmarks needed to reach true artificial intelligence.

Voice recognition has existed in some form or another since the 1950s, starting with the “Audrey” system from Bell Laboratories, which could understand only numerical digits. Since then, many other companies have toyed with voice recognition, including Dragon Dictation, which launched the first speech recognition software for the PC. Soon after, the telecom industry began creating voice portals, which were intended to replace customer service representatives by giving information through voice-activated menus. Instead, they became a nuisance to cellphone users looking for answers about their overage charges.

Despite these advances, the ability to truly communicate with machines through conversation remained confined to the realm of science fiction.

The latest advances in voice recognition software have found a niche in the automotive department. Ford’s SYNC feature debuted in 2007 and has now become a popular offering across its product line. In last year’s Super Bowl, Chevrolet advertised OnStar’s ability to read live Facebook feeds aloud.

2011 was also the year when IBM’s Watson successfully defeated record-setting “Jeopardy” champ Ken Jennings, thanks in part to its exceptional voice recognition capability.

Even with all of these advances, the capability of technology to communicate through voice recognition remained spotty at best and not available en masse.

Along Came Siri

In October 2011, Apple became the latest company to jump into the voice recognition conversation when it launched the iPhone 4S. The most buzz-worthy feature of the newest iPhone was Siri, which became the first voice recognition software incorporated into the core of a smartphone.

Prior to the release of the iPhone 4S, Siri was available as a stand-alone app in the App Store. The technology that powered Siri was born from SRI’s CALO project, the largest artificial intelligence project in history. The project created technology that linked machine learning to natural language, which helped bridge the gap between how computers speak to one another and how humans speak to one another.

Two months after its launch, Apple acquired Siri for more than $200 million and withdrew it from the App Store.

What sets Siri apart is its uncanny ability to understand and interpret language. It is by far the closest thing to talking to a real person. It engages you in conversation, by asking for clarification, learning your habits and adapting to your needs. Siri recognizes context and can truly understand what you mean to say, not just what you literally say. For example, even something as seemingly simple as “tell my wife I’m running late” involves multiple processes. Within this command, Siri knows:

  1. “Tell” means send a text message.
  2. Your wife’s name and contact information, even though she’s probably not in your address book as “wife.”
  3. “Running late” means you’re not going to arrive on time and will be there soon, and not any other connotations, such as that you’re out jogging.

What’s Next?

It will be interesting to see if in the future, Apple chooses to open Siri to other platforms and expand its capability to other areas. There have also been rumors of a new Apple TV that will use Siri to replace the traditional remote control. At least for now, however, it’s likely that Siri’s impact will be limited to the iPhone.

In 2012, we can expect to see more third parties build apps that take advantage of Siri’s capabilities. So far, we’ve seen a utility app that lets you speak your to-do list and set reminders. But Siri isn’t just a productivity tool. Expect to see a new generation of mobile games and entertainment that are controlled by the sound of your voice.

Since Siri is embedded into the core of the iPhone, it can seamlessly integrate with its other features, such as geo-location and the gyroscope. These features will combine to create an exciting new generation of apps.

Voice recognition now makes it easier than ever for users to interact with their phones, and there are implications for the mobile category as a whole. For example, if Siri does not know the answer to a question, it often suggests a mobile search over the Internet. As more mobile users adopt the iPhone 4S (and subsequent mobile devices), we can expect to see a rise in mobile searches.

Local search will be especially impacted, since Siri’s results are determined by the user’s current location. So if you’re looking for a restaurant or business near you, you can only view those that are optimized for local search. And since Yelp recommendations are embedded in Siri’s responses, ratings and reviews will continue to play a role in local search.

Beyond Siri

Beyond just the iPhone, Siri’s success has ignited a deeper conversation about the role of voice recognition in all technology. Will we finally be able to move beyond keyboards, mice and even touchscreens? Is voice recognition as an interface here to stay?

This conversation extends well beyond features for an ever-expanding list of capabilities for smartphones. Siri represents a paradigm shift in computing. It will change how we interact with computers, much like how Apple revolutionized both the smartphone and tablet markets with its multitouch screens.

Siri started as a stand-alone app, but the vast majority of the population didn’t care until it became native to the hardware. This is another factor in the buzz around Siri. It isn’t just a feature, it’s a whole new interface. You don’t have to navigate to a specific app or phrase your question in a robotic tone. Voice can now be the interface for everything.

The possibilities are endless, and tech enthusiasts everywhere are hoping that this time, reality just might catch up with science fiction. Imagine incorporating voice recognition into every machine in your house.

Wake up to an alarm clock that tells you, on command, what the traffic is like on your route to work, so you know if you can afford to snooze for five more minutes. On your morning commute, talk radio becomes much more interactive, because the radio host is asking you questions and engaging you in a conversation to get you to wake up. When you pull up to the drive-through for lunch, the voice behind the speaker never gets your order wrong, because the machine interprets your voice perfectly.

That may sound scary to some, but it’s exciting to others. Will voice replace all other interfaces for technology? Or will it always be complemented by touch features?

Implications for Brands

How can brands create relevant customer experiences by incorporating elements of voice recognition? We’ve thought of a few examples.

You walk into a retail store and open your app, which greets you when you walk in and knows the layout of the entire store. The app taps into your location and speaks to you with a customized greeting. Once you speak what you’re looking for, the app can then direct you to the nearest aisle. If the product is out of stock, the app can ask you whether you prefer to drive 1.8 miles (it knows) to the nearest Home Depot that has what you’re looking for in stock, or if you’d rather buy it online and ship it to your house.

We also foresee mobile users using Siri to further connect the in-store experience to an online environment. For example, if you’re in a store and you see something you like, just say, “Siri, is that coat already on my Amazon Wish List? If not, add it.”

These are just a few of the ways that Siri can make branded apps come to life to provide a better consumer experience. The key to taking advantage of voice recognition is recognizing its place in the overall digital experience of the brand and not treating it as a stand-alone feature.

2012 will be the year when voice recognition goes mainstream. Expect to see even more functionality from Siri as it expands its databases and opens itself to third-party app developers.

References

  1. Hall Internet Marketing
  2. Cult of Mac
  3. TUAW
  4. Mashable Tech
  5. NPR - All Tech Considered
  6. Inc.