I have written articles and commented quite a bit about Amazon Alexa and voice based conversational experiences in the media over the past 12 months.
To date there are over 10 million Alexa powered devices in consumer homes and that number is about to increase significantly with Alexa Voice Services integrating in everything from cars such as Ford Sync 3 system to mobile handsets.
Here is an example of Alexa integrated into the Ford Sync 3 system rolling out in various Ford models this fall.

Regarding Alexa skills, skills are to Alexa like apps are to mobile, when I first met with the Amazon Alexa partner team a year ago there were barely 1,000 skills published. As of today there are over 10,000 with that number continuing to increase.
In addition to skills the shift towards voice based experiences has already begun. In 2014, voice search traffic was negligible. Today it exceeds 10% of all search traffic and virtual assistants exceed 50B voice searches per month.

That number is going to continue to accelerate as it’s projected by 2020 to be over 200 billion searches per month will be done with voice. Quickly voice will be a key horizontal channel and central to a converged user experience.

What most don’t realize though is that while most experiences today are zero UI/voice only experiences, the next evolution of voice based systems will be voice + paired visual experiences.
This will ultimately be driven by new hardware that integrates screens, but initially will be driven by responsive web experiences that are powered by Alexa and hands free.
Soon virtual assistants such as the Sony XPERIA Agent shown here at MWC 2017 will have integrated screens to enhance voice + visual.

Voice based skills will be able to showcase information visually by aligning the voice intents with visual queues to create a voice controlled experience that is seamless and enhances the experience.
From highlighting dynamic content to video content, an Alexa skill can easily answer a query and showcase solutions that highlight complex solutions or highly visual elements such as what a recipe should actually look like vs. having to visualize it in ones mind.
Visual queues on the page can also enhance what a user can do with Alexa such as highlighting other related intents such as repeat, help, next steps etc… via a responsive web experience.
This is one of the challenges with pure voice experiences as the user doesn’t always know what their options are to to further engage different aspects of a given skill.
Voice + Visual can also enhance long term engagement which is currently the biggest barrier of Alexa experiences. By considering visual + voice content it is feasible to extend into more entertainment mediums that can be controlled and enhanced via voice.
Voice + Visual also has an impact on the type of data that can be gleaned from progressive profiling and opens up new ways to deploy existing content assets into a system based/virtual assistant driven journey.
I have literally seen the future through a first of it’s kind example of voice (Alexa) + visual (Responsive web) and it is mind blowing. I can’t show it publicly yet but it will reframe your approach to voice based strategy.
Will update this post once the 1st voice + paired visual experience skill is published shortly with visuals.
Follow Tom Edwards @BlackFin360
