When the first smart speaker was announced by Amazon in 2014, I was unambiguously enthusiastic about its potential with kids. I immediately approached Amazon, then later Google about creating skills/actions for Alexa and Google Home. Sadly, I was told by both companies that they weren’t interested in children’s content because of privacy/COPPA concerns. Since my company couldn’t count on support from the platforms, and there wasn’t any way to monetize our development efforts, I didn’t pursue the opportunity.

Fast forward to today. Amazon’s Alexa just released a new child friendly device, called Echo Dot Kids with a monthly subscription plan and parent friendly features. Google Home has a huge library of family-friendly content. What accounts for the turnaround? Late last year the Federal Trade Commission revised COPPA to essentially look the other way when companies collect voice recordings of children under the age of 13, citing that the “FTC would not take an enforcement action,” as long as companies use an audio file to transcribe a command and then immediately delete it.  Adding fuel to the fire, Amazon and Google have disclosed that families love the device, describing parents as “voice-assistance power users.”

Smart speakers, (also known as Voice Assistants/VA), have enormous potential with kids. They use Speech Recognition, Natural Language Processing, Artificial Intelligence, and Machine Learning to understand what is being said and how to respond. They engage and empower kids with endless audio interactivity. Smart speakers support a natural way of interacting that doesn’t require reading, starring at a screen, or interpreting interfaces. Unlike parents glued to their cell phones, a Voice Assistant always reacts to the child and says something, even if it isn’t actually an answer to the question they asked.  And there’s the rub.

Children neither speak nor think like adults.  What are some of the unique issues that arise when designing voice interactions for kids?

  • To begin with, we don’t really understand what children are thinking when talking to a disembodied voice. What is their “theory of mind” as they stand in front of a Google Home or Echo device?  This charming 2017 study from MIT Media Lab, “Hey Google is it OK if I eat you?”   revealed significant developmental differences. A younger child (4 years old) treated Alexa like a person, asking questions like “What’s your favorite color?”  and “How old are you?” Older kids (7, 8, 9 years) treated it more like “a robot in a box,” believing that the smart speaker could learn from their mistakes. Knowing that young children think of the smart speaker as a human gives one pause. Will they “take it personally” when the voice assistant continually responds incorrectly to a query?
  • We know that memory for audio-only information is relatively poor among kids, especially compared to reading or watching a story. This was one of the key findings of my doctoral dissertation research many moons ago and should obviously be considered in the design of audio only interactive content. Despite that, the first skill I helped design was a simple branching detective story, in the spirit of Encyclopedia Brown. We conveyed hints about “who dun it” throughout the story. Turns out there were way too many puzzle pieces to hold in short term memory. We realized that we had to write a simpler story, break it up in much smaller bits, and continually reinforce important information.
  • Egocentrism is a key feature of childhood. Kids have difficulty putting themselves in the place of another person, seeing the world from another perspective. It is not until children are about 11 years old that they begin to accept the limitations of their knowledge and understand that their knowledge is not the same as others. Asking a question of a Voice Assistant like Alexa, Siri, or Google requires them to adhere to some fairly strict conventions and to understand that they may have to adjust the way in which they ask a question in order to be understood. For example, when I asked, “How does an ice cream truck sound?” I got a charming response from my Google Home device (try it!). But when my four-year-old grandson asked what he thought was the same question, “Play me an ice cream truck,” we ended up with some rap music on Spotify.  Getting Izzie to rephrase his question, to think about how to ask it differently in order to be understood by Goggle, was a tall (impossible) order.  When we received a complicated answer from Wikipedia in answer to a question about cement mixers, Izzie walked away.

Creating a smart speaker/Voice Assistant for kids that can understand and respond satisfactorily strikes me as a true Turing Test, and one that we haven’t yet achieved.  While adding kid-centric skills/actions is a good first step, smart speakers are still more frustrating than they should be, both in understanding children’s speech and intent, as well as in their responses, or lack thereof.