Pocobor.

Interface/Off (Part III): Touch-Free Systems

(Context: this is Part III of a look at the future of smart product interface technology; Part I set the stage and Part II looked at touch-based systems.)

Continuing along the spectrum of increasing sophistication and novelty, we come to touch-free interface systems. In the last post, I talked about some of the limitations of touch-based systems (usage scenario rigidity, physical contact requirements, and inherent 2D-ness). Some of the interesting advantages of touch-free approaches revolve around their ability to address those same limitations.

Gestural
Gestural interfaces typically use a vision capture system such as a camera and image processing algorithms to recognize hand and arm gestures made by the user. If you have seen the movie Minority Report, it contains a scene with a good example of the concept. This type of approach enables the use of very intuitive behavior from the user (e.g. grabbing or tapping an icon to interact with it) but, in contrast to touch screens, do not require the user to make physical contact with the interface and can be much more flexible in terms of user location. As an example, imagine a TV with a gestural interface that could be controlled equally well regardless of whether you were sitting on a sofa on one side of the room or a chair on the other side. There have also been some interesting developments in wearable systems, such as the SixthSense developed by Pranav Mistry and MIT’s Media Lab:

However, there are also some inherent limitations to these types of systems as well as challenges with implementation. First of all, the same gesture can look very different for different users, or even for the same user in different instances. Gestural systems have to be able to “translate” gestures that may look different but mean the same thing. Second of all, the hardware (a camera that can distinguish minute hand movements) and software (the algorithms to translate those subtle movements into the intent of the user) are non-trivial from an engineering standpoint.

Eye Tracking

Eye tracking could be considered a variant in the gestural interface family, albeit one that is better suited to close-in use. The idea is that instead of looking at hand and arm gestures, these systems look at tiny movements of the user’s eyes. For instance, imagine flicking your eyes down then back up to scroll this webpage down. The main advantage is ease of use, often for those with medical reasons preventing them from using traditional interfaces. There are a number of companies that have made impressive progress in commercializing systems for a broader audience, such as Tobii, but I question whether these systems will be able to match the performance of some alternatives in the long run. The size and range of motion of human eyes is sufficiently limited relative to the size and freedom of hands that I think it will be an uphill battle for eye tracking systems outside of applications where they are the only option.

Voice Command
In fundamental contrast to the eyes, which are typically used to receive data, and the arms and hands, which rarely convey large amounts of data, the human voice is anatomically and culturally evolved to transmit lots and lots of data. This makes it very well suited to telling your smart product what you want it to do. However, implementing an effective system is quite a technological challenge. The first step is recognizing words – imagine a voice control system used for keyboard replacement. However, newer systems, such as Siri for the later iPhone models, attempt to make the jump from recognizing words to understanding meaning and responding appropriately. This is a profound shift and one that will likely require continued refinement to accommodate different accents, dialects, styles of pro- and enunciations, and other peccadillos of individual human speech. The brain is an astounding processing tool but I believe that speech recognition algorithms will continue to make significant strides over the next few years and will be able to match the ability of humans to understand each other within a decade.

One arena in which voice systems are limited, however, is in conveying information back to the user. There is a reason for the expression “a picture is worth a thousand words” – there are certain concepts that are easier to convey visually than verbally. There are also potential scaling issues with voice systems – imagine everyone on a subway or in an office trying to control their phone or computer simultaneously with voice commands. All in all, though, I think that when combined with displays, voice control systems will be a significant part of the technology landscape in the future.

In my next post, I will look at the most sci-fi-esque possibility – brain-computer interfaces (thought control again!).

Be Sociable, Share!