The battle for conversational interfaces

When is a voice controlled interface the right choice?

Conversational interfaces are sometimes very useful, but the acoustic language is not always the best option. We may sometimes overestimate the acoustic language without realizing that we often communicate better and more efficiently by other means. Learn more in the "In the Code" article by our CI expert Michael Wechner, published on

There is a war raging between the tech giants for supremacy in conversational interfaces. The winners can collect even more data, which allows for better understanding of the user behavior. But what is the benefit of a conversational interface for the user? The poet Carl Sandburg wrote in 1936 “Sometime they'll give a war and nobody will come”. Is the conversational interface just a hype, driven by the fight for survival among the tech giants? To answer the question about the benefit to the user, it helps to ask “why do people even speak”? Language is a form of communication with which one exchanges information. Information sharing is a mode of survival. MRI scans of the human brain show that the same reward system is active in the brain during sex, eating and sharing information. Languages have evolved, because they help us to survive. The conclusion is: We should not implement conversational interfaces just for the sake of language, but ask ourselves “what helps us to survive”? The use of language follows as an automatic consequence. In the following sections, we are giving some examples to demonstrate this finding.

Voice interface versus button

Suppose we fly back from a business trip and want to get home from the airport as fast as possible. However, we don't know when the next train leaves or whether we have to hurry to the airport train station. If we had a human assistant who knew the train schedule by heart, we would ask him when the next train was leaving for home. The assistant would reply “there's a train in 13 minutes, that we can still catch, but we'll have to hurry”, or “a train just left and the next one leaves in 40 minutes. So we can take our time getting off the plane now”. Many mobile train schedule apps now have a “Take me home” button. This means that the app knows where we live and thanks to the geo-location feature, we'll receive a suggestion for the departure of the next trains at the push of a button. If we are still on board the plane, we'll probably prefer to use the “Take me home” button rather than the voice interface. It's efficient and the other passengers nearby won't hear where we live. But if we are already on the way to the train station with luggage in both hands, we would be glad to have a voice interface.

CI uses for shopping

For some time Amazon has been using Amazon Go to digitize the offline purchasing process. It is likely that Amazon will use this process in the grocery chain Whole Foods, which it took over in the summer of 2017 for USD 13.7 billion. The offline “end-to-end shopping process” may be all digital in the future. Customers make their shopping list when they are either at home or traveling based on their mobile voice interface. When entering the store, the shopping list is arranged according to the shop layout. Augmented Reality on the mobile helps the customers to find their products and get additional product information. The mobile app syncs automatically when a product is put into the actual shopping cart and the customers no longer have to wait at a checkout, but can conveniently pay via the mobile app as they leave the store. The voice interface is just a piece of the puzzle in this process. It makes sense to use a conversational interface when entering the shopping list, as it is very efficient.

Talking to a robot in the elevator

People with muscular atrophy have progressive muscle weakness. Modern medicines may slow down the progress, but not stop it. Preserving the quality of life as long as possible is the main aim. This includes the use of a robotic arm that is attached to the wheelchair and controlled by a joystick. The robotic arm can be used, for example, to choose the floor number in an elevator. In this case, an elevator with a voice interface would be the simplest solution, but not every elevator will be equipped with a voice interface in the near future. A possible intermediate solution would be a robotic arm with voice interface and image recognition. If the person in the elevator could tell the robotic arm «please go to the 2nd floor», the robotic arm would automatically search for the control panel and press the button for the 2nd floor.

Are conversational interfaces overrated?

The described applications show that conversational interfaces are sometimes very useful, but the acoustic language is not always the best option. We may sometimes overestimate the acoustic language without realizing that we often communicate better and more efficiently by other means. Good alternatives are shown, for example, in motion. Nature has created legs and fins. However, humans have developed wheels, propellers and jet nozzles that are more efficient than legs or fins, depending on the field of application. It therefore makes sense to first identify the actual problem and then to find the best solution for it.

Learning from babies

Back to the question then as to why we humans speak. What options does a newborn child have to communicate? If it is hungry or in pain or wants something else? How often do we see screaming kids and the parents don't know why they are screaming. How else can the child communicate, apart from using gestures and behaviors that can be seen? Do we learn to speak due to a lack of other choices?

Toddlers learn about 50 words on average between 18 and 24 months old. From then on, usually the so-called vocabulary explosion begins. This means that the vocabulary is significantly enlarged in a relatively short time and the children begin to form whole sentences. There are possible explanations for the vocabulary explosion, for example the Fast Mapping Theory or the research of Bob McMurray, even though the scientific and research worlds don't seem to be able to find a consensus. Maybe also because the topic was not relevant enough up to this point.

Like toddlers, we are currently learning our first words for the development of conversational interfaces and trying to form whole sentences. The tech giants' fight for survival supports this development. It is still difficult to estimate exactly if and when the explosion will take place. But it's exactly this unpredictability that makes the topic of conversational interfaces exciting!

What the users are saying: Joint study with Zeix

Zeix and Netcetera have jointly conducted a user survey. It relies on a voice-based train schedule app developed by Netcetera as a prototype. The key findings are:

  • The users often do not know what and how to ask. The expectations of interfaces with voice input vary between minimum and huge. The frustration potential is correspondingly high.
  • More complex conversational interfaces require, in part, high cognitive skills in the processing of the conversation. Most of the time artificial intelligence is not yet sufficient. Most users would not reuse the interface after a disappointing experience. Very simple conversational interfaces work well, which don't need any special cognitive skills in the backend.
  • Interfaces with voice input are especially useful in situations where someone's hands are not free. Therefore, any obstacles to opening the interface should preferably be kept at a minimum.

The full study is available here (in German).

Behind the scenes: read about the functionality of a voice controlled timetabling app:

More stories

On this topic