Over the past year I have focused research efforts on the shift towards conversational experiences and what consumers expect. The research has been covered by Adweek and it’s fascinating how open consumers are to engaging and adopting these experiences as long as they are easy to use and are convenient.
One flavor of conversational experiences is tied to voice based user experiences. I recently visited Amazon HQ in Seattle and wrote about my experience with the newly formed Amazon Alexa partner team and the rise of voice based user experiences.
Since this article published I have seen client interest and demand for voice based concepts and skill creation rise as our brand partners see the potential of voice based systems.
Here is a slide from a recent client presentation. Almost every meeting over the past few months has included discussions around voice based UI.
I strongly believe that we will begin to see a convergence over the next few years where elements that enable connection such as social messaging and voice based conversational user experiences combined with cognitive computing (AI) and immersive experiences such as holographic computing will become interconnected and will redefine how we approach connecting with consumers.
Voice based experiences will play a key role during this time as our interactions with connected systems and the rise of micro services as a primary mechanism to navigate a hyper connected world will become the new normal.
We will begin to see services such as Alexa Voice Services quickly proliferate throughout 3rd party devices from in home IOT systems to connected vehicles and “skills” will become a key component for how we navigate beyond screens. Estimates already show over 28 billion connected devices by 2019.
Developing voice based experiences differs greatly from visually driven experiences. Visual experiences provide immediate context and cues to the end user that can guide the user and enhance the experience.
Here are 5 emerging voice UI design patterns the Amazon team and I discussed and subsequent best practices and points to consider when designing voice based skills.
- Infinitely Wide Top Level UI
With a mobile user experience, users have the benefit of visual cues that can guide their actions within a given experience. Be it a hamburger menu or on-screen prompts. With Voice based UI the top level of the UI is infinitely wide. Here are a few best practices for building solutions to beyond infinity wide top level.
Don’t assume users know what to do – It’s important the first time a voice skill is initiated to provide additional detail and tell the user about their what options they have for interacting with your experience.
Expect the Unexpected – Unlike visual interfaces there is no way to limit what users can say in speech interaction. It’s important to plan for reasonable things users might say that are not supported and handle intelligently.
2) Definitive Choices – The key to successful Voice UI design is to make the next consumer action clear. Consumers will not always say what they want so it is incredibly important to map intent beyond the normal function of a skill. An example is how a consumer may end a session. They may utter done, quit, etc… and the skill needs to provide clear action for how to end the session. Here are additional points to consider.
Make it clear that the user needs to respond – Ask the user a question vs. simply making a statement.
Clearly present the options – Prompts are very important, especially if the question set is an either/or vs. yes/no.
Keep it Brief – Speech is linear and time based. Users cannot skim spoken content like visual content. Quick decisions are key, so voice based prompts should be short, clear and concise.
Avoid too many choices – Make sure choices are clearly stated and do not present more than three choices at a time, avoid repetitive words.
Use Confirmation Selectively – Avoid dialogs that create too many confirmations, but confirm actions of high consequence.
3) Automatic Learning
One of the areas I am most excited about over the next few years is the intersection of artificial intelligence and the ability to apply machine learning and other higher level algorithms to create more personalized experiences. For Voice based UI it is important to understand how sessions can persist over time.
Obtain one piece of information at a time – Users may not always give all of the information required in a single step. Ask for missing information step by step and focus on a progressive profiling strategy vs. lead capture.
Develop for Time Lapse – It is possible to create skills that allow for sessions to persist with end users. This can be hours or days. This can allow more data to be collected across sessions.
Personalize Over Time – As sessions persist and users interact with skills it is possible to further personalize the experience over time based on previous interactions.
4) Proactive Explanation
With traditional visual design a user can open a web page or a mobile app and the information design shows you what to do. With voice you don’t have a page so having the ability to clearly articulate definitive choices in addition to providing proactive explanations such as tutorials or help are critically important to reduce user frustration.
Offer help for Complex Skills – If a skill does more than three functions, it is important to not overload a single prompt to the user. Present the most important information first, along with the option of a help session.
Make sure users know they are in the right place – In speech only interactions, users do not have the benefit of visuals to orient themselves. Using “landmarks” tells users that Alexa heard them correctly, orients them in the interaction and helps to instill trust.
Use Re-Promptiong to Provide Guidance – Offer a re-prompt if an error is triggered. This should include guidance on next steps
Offer a way out if the user gets stuck – Add instructions into the help session. “ You can also stop, if you’re done”.
Don’t blame the user – Errors will happen. Do not place blame on the user when errors happen.
5) Natural Dialog
Research shows that people are “voice activated” and we respond to voice technologies as we respond to actual people. This makes the crafting of voice based narratives incredibly important as the dialog needs to be natural, consumable and written for the ear not the eye. Here are a few key points to consider for enhancing natural dialog within a skill.
Present information in consumable pieces – Humans only retain a small amount of information that they hear, only present what is absolutely required in order to keep the interaction as short as possible.
Longer lists need to be broken out into three to five items and ask the user if they want to continue after presented with each chunk.
Write for the Ear, not the Eye – The prompts written for voice-forward experiences will be heard, not read, so it’s important to write them for spoken conversation. Pay attention to punctuation.
Avoid Technical & Legal Jargon – Be honest with the user, but don’t use technical jargon that the user won’t understand or that does not sound natural. Add legal disclaimers to the Alexa app for users to read and process.
Rely on the text, not stress and intonation – Use words to effectively convey information. It is not possible to control the stress and intonation of the speech. You can add breaks but cannot change elements such as pitch, range, rate, duration and volume.
Clarify Specialized Abbreviations and Symbols – If an abbreviation such as a phone number or chemical compound is somewhat specialized, ensure to test the text-to-speech conversion to see if additional steps need to be made.
One final takeaway RE: the Alexa voice based system is the proximity to transaction and list creation via Amazon’s core services. This combined with 6 years of development tied to Alexa Voice Services and the rising partner ecosystem are all signals towards the convergence of connection, cognition and immersion.
Follow Tom Edwards @BlackFin360