This post is based on a presentation I gave in front of the Federal Ministry of Education and Research – dept. 524 Human-Technology-Interaction in April 2018 at Fraunhofer Institute Stuttgart.
We’re developing Neo, the digital assistant for business. Hence, we’re facing the challenges and limitations of conversational interfaces, and voice in particular, on an almost daily basis. Based on our work with pilot customers and early adopters, we identified 5 themes, that will define and shape the human-computer interaction of the future. Contrary to what the ability and intelligent appearance of digital assistants today suggests, there are many unanswered and important questions along our journey.
Why developing a digital assistant is the way to go
Our AI-based assistant Neo supports employee with getting tedious tasks done and interacting with complex software systems. Users ask questions and get answers – or delegate their tasks to Neo. The application of a system like that ranges from simple database-queries to assisting with processes (such as calling in sick). “Hey Neo, what’s my budget?” is a question you can ask our assistant, and he’ll pull your current budget from SAP and present it in plain and simple terms. Better yet: Instead of asking, he’ll pro-actively inform you, if we think that you’re likely over- or underspending on your budget! “Hey Neo, I cannot come to work today. I’m sick.” is another task you can hand over to Neo. He will inform your managers, reschedule your meetings and tell you, whether you need to get a doctor’s note.
With or without software – the struggle is real
Marc Andreessen famously said that “software is eating the world“. And in fact, software is part of our everyday lives: Whether that’s online-shopping and messaging, or complex ERP-software in our professional life; getting things done without using software is just about impossible. When it comes to using software, however, we run into difficulties: We have to learn to speak the software’s language and operating logic. Teaching others to use software is a whole industry and profession.
From terminal commands to pinching our touch screens
Not too long ago, we were using shell commands to interact with computers: Green pixel just waiting for your keyboard-input. With the introduction of a graphical user interface (GUI) and Microsoft’s triumph, computers were soon to be found in every household – and usable! Equipped with your computer mouse, users were able to click through menus and trigger actions. Even today, a GUI is the prevalent input method on smartphones and tablets being used touching the screen. Although this seems like a quantum leap, upon looking closely you’ll find us still adapting to a system’s operating logic – and still have to learn “commands” to use our devices and software.
Why software’s able to listen and talk: Conversational interfaces
What’s changing is that we can “talk” to systems in our language. With the progress in technology of the past decades, computers started to learn and understand our language. That allows us to – instead of clicking 12 times through a hierarchical menu of our ERP-system to get our budget – ask a question (i.e. “what’s my current budget?”), and software is able to recognize our intent based on the question: What are you trying to achieve? This also means that you will not have to learn a special set of vocabulary or commands. Instead, you can write – or talk – to your software!
Neo’s natural language understanding
Understanding spoken language is – basically – just an additional layer to the already existing intent recognition. Speech is transcribed to text. Based on that text, a system applies machine learning “magic” (algorithms) to distill the user’s intention (“intent”). Interaction isn’t a one way street, though: Computers learn to respond to these questions within a dialogue – and, voila, we have a “conversational interface”.
Where is the interaction with digital assistant headed to? And, are we there yet?
We’ll now outline our 5 key-hypotheses for the future of human-technology interaction and collaboration – which are at the same time 5 key challenges we’re working on.
(1) Interfaces will be decentralised
Currently, a lot of our activity is centered around our smartphones: There’s an app for everything, whether that’s for mobile or computer. However, why are we supposed to carry our devices (and interfaces) with us? Is a mobile the ideal companion for everything?
Greater benefit would come from people getting the right thing, at the right time, on the best [available] deviceMichal Levin in Designing Multi-Device Experiences
We are convinced that human-computer interaction will go in the opposite direction and that we’ll see a decentralisation of interfaces. Software will be where it’s needed: Smart-watches, smart-speakers, computers, machinery and obviously on your mobile. Instead of a “mobile-first” (or any device -first) approach to user-experience design, we’ll see a shift towards “context-first”: At what place, at what time and on which device becomes this interaction the most useful it can be for the user?
(2) Breaking up system silos
In order for decentralization to work, systems have to be connected. What might sound trivial becomes a real challenge in our daily lives: Just think about transferring a file from your mobile to your desktop. By alienating email or messengers, we send mails to ourselves to transfer files.
As of now, interaction takes place in silos; systems are mostly closed ecosystems: Alexa cannot interact with my Mac – Siri on my iPhone cannot talk with my Windows computer – Google Assistant cannot connect with my desktop – et cetera.
A closed system itself isn’t a bad thing, though – as long as it’s used within a specific, closed scope. In a “context-first” mindset, however, users would expect that systems work for him, no matter where and when. So, we need to break up these barriers and allow information and data to flow freely; systems and devices need to keep time with one another.
(3) Digital assistant have to understand context
If we interpret “natural language understanding” in a wider sense, our natural language makes use of signs other than speech, and highly depends on context. Context, however, is a multi-dimensional concept, but let’s start with the basics and look at a cliche-ish chatbot conversation of today:
A bot, and even a digital assistant, quickly forgets what has been said before. Apart from that, context also includes the user’s environment, time of day, location and the device being used. But we also have to adapt based on soft factors, such as mood, prior interactions, the task at hand, the intended goal, as well as the level of trust with the assistant. A digital assistant has to both detect and understand these signals, and alter his communication and behaviour accordingly: A stressed user expects a highly goal-oriented and efficient answer, whereas the same user could also appreciate a small joke after a long meeting,
(4) Kill the menu: Instant Actions
Hierarchies, menus and software-specific commands are an artefact of having all our apps and software on very few devices. If we, however, consider that interfaces will understand context and that they eventually will be decentralized, we no longer need that: Systems start to understand, what we want without having to navigate through menus; actions happen instantly.
By saying “turn the light of”, my smart home should understand which device I refer to by interpreting context: Where am I, what time is it, which lights are turned on, et cetera. It should know which device I want to be switched off without having to explicitly ask for the device’s name.
(5) Pro-active interactions
Lastly, our interaction with systems and technology will be of much more passive nature: We will be informed, whenever something happens or becomes relevant for us. It is no longer our job to trigger every action manually our through a regular schedule – actions will be triggered by our assistant, when it anticipates that they’ll be relevant to us. For instance, why are we checking our dashboards all the time, if we simply could be assured that everything’s alright, or be informed as soon as something deserves our attention. Why do we even have to say “turn off the lights”, when a system could know when we go to bed and automatically turn off the lights?
Effective human computer symbiosis with a digital assistant
If these 5 aspects – decentralization, connectivity, understanding of context, instant actions and pro-activity – were melted into a digital assistant, we could create an effective man-machine symbiosis and collaboration.
We need to think in ecosystems and build […] interfaces into continuous and complementary experiences to finally make them useful.Jan Koenig, CEO & Co-Founder @ Jovo.Tech
Computers should be responsible for those areas, in which they excel at (data analysis, API-communication, et cetera), whereas we as humans focus on creative tasks such as research, creation, and fields that relate to human-human interaction (i.e. sales). However, there’s a long and rocky road ahead of us – but as always, you’ll have to work it out step by step. We are looking forward to these challenges and are excited for building this future. If you’re eager to go on this journey as well, we’re always looking for talented minds to join us in our mission.