iPhone 4s and Siri

Seems as though every year Apple tries to come up with some major improvement to the iPhone so as to not only sell phones to newcomers, but to entice the folks who upgraded just last year to throw up their arms and say “I gotta have it!” and do an early upgrade, which, a fair number of iPhone users do.  The first generation iPhone brought us inertial scrolling, rubber banding, and don’t forget a phone, internet device, and an iPod, all rolled into one.  The second generation brought 3G connectivity into the mix, which, was badly needed, and worthy of being the driving force behind the 3G.  The third generation added the compass as well as a faster processor to give the iPhone the power it needed to continue to grow.  The fourth generation brought a still faster device that added a gyroscope to improve gaming as well as a radical redesign of the form factor of the device that not only improved the phone’s “phone” function, but did so with a completely outside of the box approach to it, and making it super thin, and sporting the highest level of build quality not seen in the industry since perhaps the Motorola V3 Razr.

This time around, the iPhone 4s, while retaining last year’s awesome form factor.  Personally, I see no problem with that.  You can’t come up with a groundbreaking redesign every year, and last year’s radical redesign is one year older, and some tweaks in the design have been done, albeit not quite so visible, they’re still there.  Also, internally, it’s a whole new phone with the A5 processor as expected, so despite looking much the same as last year, much is different, and significantly improved too.

Siri… the big teaser by far this year is the introduction of Siri.  iPhone 3Gs/4 owners may already be vaguely familiar with Siri aka “Voice Control” on the iPhone 3Gs/4.  It seems to have been the initial foundation of Siri as we can say “Play music by the Beatles” or “Call Phil” in pretty much the same way you can with Siri, and executed the same way by holding down the home key.

AI has been around for a LONG time, and many systems have come and gone with various levels of success.  I think overall, Siri drags AI forward a bit, but just a bit.  Not to take away from what Apple’s done by any stretch, but the “magic” behind Siri isn’t all that new, but I think how Apple has implemented it into the OS is where the magic comes from.

Basically, 3 things are happening when you talk to Siri.  First, what you say is converted from audio to text, then, a parser reads that text, and figures out what you’re trying to do, then, once the parser has what you want to do figured out, and here’s where the magic comes from, it is integrated into the OS so it can call up the same apps you use, and interact with them just as you do.  For example, sending a text message….

You say to Siri, “I want to send a message to Jim telling him I’ll be 15 minutes late.”

The first portion of Siri simply takes that recorded audio file and converts it to the exact text you see above.  Not that that’s easy or anything, but work in that area is pretty advanced, and that combined with the closeness of the microphone to the speaker, and, that the device also has the built in noise cancellation hardware and software means Siri does this part of the job really well.

The second portion of Sir, the parser, is fed that same text, and using various algorithms it throws useless stuff like “I want to” (that’s implied), and keeps the important bits like “send message”, “Jim” and “I’ll be 15 minutes late”.  So Siri looks at everything you say, and separates it into two classes, actions and data.  In this example “telling him” is you way of saying “this is the body of the text message” to Siri.  You could have just as easily said “Send a text to Jim saying I’ll be 15 minutes late.” and it would have worked just as well.  To Siri, “Send a text”, “Send a message”, “New text message”, etc, all have the same meaning, it’s telling Siri the text messaging app is what we’re using.  Also, “telling him that”, or “telling her that”, or “saying”, or “saying to him” or “saying to her”, etc all mean the same thing as well (what follows is the message body).

Thirdly, once Siri knows what app we’re using, and what data we’re sending to it, it opens that application and sends the data to it just as you would do so pushing buttons on the touch screen.

Siri works because of the limited scope of what it was designed to be good at.  Basically, text messages, maps, reminders, contacts, weather, etc.  Because it’s intended use is limited to this fairly small scope, it can be heavily tuned to work well in it’s area of expertise.  Also, Apple incorporated multiple responses to the same state.  For example, when asking if it’s going to rain, instead of a straight, robotic “Yes” or “No” Siri will respond with different phrases, particularly fuzzy human phrases like “I don’t think so”, or “probably not”.  It’s these little touches combined with Siri’s limited scope that tend to give Siri that “Smarter than I really am” look and feel, and will, in my opinion, really serve to continue to set the iPhone apart from the rest.