Thursday, May 26, 2016

AI ≠ Voice Interface

With all this drama over how Apple is lagging in "AI" I think it's worth taking a closer look at the real issues at play.  There's a big difference between "AI" or "bots" and voice interface.  And you don't have to be good at both.

Google has been placing it's bet on "AI" (which I'm using as shorthand for the various machine learning they employ; be it a bot or a knowledge graph, etc.) for a while.  Google Now could exist without any voice interactions.  The idea of serving up information when you need it "intuitively" is one construct that is not necessarily tied to a voice command.  That's the avenue that Google (and I think ultimately Amazon) want to chase- getting predictive in the information they serve you.  Because once they figure out what you want or need before you realize it, they can sell the opportunity to fill that need to the highest bidder.  That's their business models.  But it's a model that requires a lot of data collection and analyzing to work. 

Right now that requires big computing resources that only scale in the cloud; which is partly why Google and Amazon are leading there as well.  But eventually the cost of that compute will come way down and be manageable on-device.  As it does, Apple can move there while maintaining their privacy ethos.  And they'll still have the advantages of A) great hardware B) a privacy value and C) the experience of a developed market.  They won't be first, but they will be best and likely mainstream.  That's what the hyper-early adopting tech press doesn't ever remember.  MOST people want this stuff when it's matured.  Basic adoption curve.  But even if they don't, they can make those service available through the iOS platform in ways that make them accessible to user that want it.  They don't HAVE to play in that space; they just have to make it accessible.

Voice interface, on the other hand, doesn't require as much machine learning and can be achieved today without sacrificing privacy.  Apple can, and I believe will, move there quickly; but it will require giving up a little control. A well-made API would allow developers to script commands based on voice inputs.  The keys here are in parsing and preference.

Preference is the easy one if Apple is willing to bend.  For example, I use a 3rd party calendar app (Calendars 5 by Readdle and I love it).  So let's assume Apple makes a Siri API that Readdle builds to so that when I say "Siri, add a massage to my calendar at 4pm on Friday" it's able to go directly to the Readdle app.  The challenge is how does it recognize that preference?  One option- the "AI" option- is to use my phone's usage data to know intuitively that I use Calendars 5 and never open the stock app.  That's the harder way for Apple but (hopefully) seamless for the user.  But it requires data.  The other way is super simple if you'll give up the control- let me set a preference for "default" apps.  I need to be able to tell Siri in a settings panel that this is my calendar app, this is my Twitter app, this is my notes app, etc.  So that when I tell Siri to "make a note" she knows I mean an Evernote, not an Apple Note.  That's the easy, privacy-based solution.  I think that's where Apple will go.  Yes, it means letting us tinker with default apps and yes it means another settings menu; but it also means privacy integrity and that's the bigger Apple value.

Parsing is the real challenge.  Natural language is incredibly difficult.  Parsing idioms and context is incredibly advanced.  Even the Amazon Echo isn't good at it- if you don't get the "command line" correct, it will reject the direction.  And yet, natural language, more than AI, is what will drive adoption of the voice interface.  Even a causal user would have an easier time picking "default apps" with a good UI guide if it means they can say anything once it's set-up.  This is actually Apple's advantage- they're good at natural language- but translating that into an API is no small feat.  How do you get a partner to understand the 10+ different ways an English speaker might ask for the weather? ("How's the weather?, What's the weather?, What's it like outside? What's the temperature? etc, etc.).  Siri is quite good at giving you what you ask for an parsing those linguistic quirks.  Any API has to bridge that so that when I say "Check me in here" it knows that A) I mean Foursquare, not Yelp and B) That I just want to be checked-in; not with a picture or a rating or anything else.  (Beyond that, there are greater challenges.  How do you eventually handle duplicate service requests? (i.e., "Check me in here on Foursqure AND Yelp AND Facebook" for example).  It gets VERY complex but it can be done and it can be done on-device.)

Voice interface is about getting my device to do the same things it would if I was using my hands.  It's about navigation and action.  That's no more a privacy sacrifice than what I do with my phone today.  It's an input/ UI problem, not a data one.  Which is why I think Apple will quickly catch-up and in many ways lead.  They own the software that controls the hardware.  They're well positioned to solve interface challenges better than anyone.

But in the end, the key here is recognizing the difference between "AI" and voice interface.  I'm confident Apple does.  Not so sure about the average pundit.

No comments:

Post a Comment