With digitization, not only processes and business models are changing, but also the way people interact with computers, devices, and machines. In 2017, the smartphone celebrated its 10th birthday, and – within a decade – revolutionized our usage of technology. Smartphones, tablets, and smartwatches have quickly established themselves in both our private and professional life and enjoy great popularity.
People are swiping, tapping and even talking to their smart devices. However, the possibilities of speech recognition and natural language understanding, which we see evolve in various devices, are still at the beginning but bear the tremendous potential to shape human-computer interaction: Voice interaction (i.e. the act of speaking and listening to machines) is much faster than text interaction (texting and reading), and can thus enable new use cases in the professional environment and corporate software.
|Receptive Voice-Processing||Pro-active Voice-Processing|
|listen 👂||talk 🗣|
|read 📖||write ✍️|
The four skills acc. to Huneke and Steinig (1997, 91)
Listening and speaking are part of verbal communication and belong to the phonetic representation of language: We communicate at the same time and directly through acoustics, and our linguistic processing is fast, which results in an increased reaction speed in the interaction.
Voice control: From natural language to the user’s intent
Computers learn to understand our language and respond to our questions.
With the advances in speech recognition and understanding, the use of language-based interaction with computers, devices and machines has recently become possible. As a result, it is no longer necessary to click through different menu structures or to perform textual input in order to achieve the actual intention of the software operation.
What usually happens, is that we transcribe spoken language to text using voice-recognition and speech-to-text technology. Then, it is processed by algorithms to extract information and data – and to understand the user’s intention, also called “intent”.
Powered by speech recognition and natural language understanding, it is no longer necessary to click 12 times until the budget is queried from an ERP system, but the employee can simply ask a question: “What is my budget?”, and an intelligent system derives the actual intention, and the user is able to receive the desired answer.
Context-based interaction: Voice commands alone are not conversational
The mere fact that language assistants or digital assistants have the ability to recognize language and “respond” to it, does not yet result in a conversation between the human being and the digital assistant or voice assistant. It is crucial that digital assistants understand the context and can respond to it. However, understanding the context is not an easy task. People usually understand the context in a conversation better if they have known the conversation partner for some time and also know the topic of conversation or the context of the conversation from the past. Digital assistants, on the other hand, do not readily understand the context – they have to be trained with the help of large amounts of data; and even then, it can’t be guaranteed that they develop an understanding of context.
The following visualization shows, that personal conversation between people is still most efficient in understanding the context. Digital assistants, voice assistants and also chatbots currently have their challenges in understanding and making sense of context.
Especially in the professional environment, it is very important that the context of the conversation is understood in order to avoid serious misunderstandings. It would also be conducive to the establishment of this technology, if voice assistants understood not only the content of the conversation, but also the theme, as what is said by whom. Because of the general theme, the context of the conversation can change – irony is a classic example.
Reasons for the use of speech technologies
There are many reasons for using speech recognition and speech synthesis technologies. Above all, the faster response time is the most frequently mentioned reason for using voice technologies in our daily lives. Other reasons include the fun factor with language technologies, mobility behavior as well as the occupancy or contamination of our hands.
However, voice interaction does not always make sense – for example, in a large office, communication by voice is not always conducive, and using voice commands in a meeting can disturb the participants. Therefore, speech recognition and understanding is not always the “royal road” of interaction.
Meet your AI-assistant for business: Neo texts and talks to you
The use of a digital AI-assistants like Neo, which can be controlled by text and voice, is diverse. Neo offers both forms of interaction (text and voice) to offer a suitable interaction option for the user, depending on the application and the situation. Neo can help in sales, human resources, finance, customer service, project and quality management. In doing so, every user can use their preferred form of interaction without limiting themselves.
If you would like to use a digital AI-assistant in your company, get in touch with us and we will discuss your use cases with you!