Scaling Free-Form Gesture Control 

(Through Massively Useful Gesture Systems)

May, 2019

The following is my capstone project at Parsons The New School. This project was born out of my curiosity about why —despite the technology available to us today— are we not interacting with computers in truly collaborative and natural ways.


I would like to thank the following people for their invaluable contribution to this project:

Ed Keller
Pete Hawkes
Shishir Raut

“Until now, we have always had to adapt to the limits of technology and conform the way we work with computers to a set of arbitrary conventions and procedures. With NUI, computing devices will adapt to our needs and preferences for the first time and humans will begin to use technology in whatever way is most comfortable and natural for us.”

—Bill Gates


Not to say our current Graphic User Interface based device ecosystems are not useful. GUIs are ultra-effective when It comes down to the capture of attention. Whenever you open your turn on your phone, you are greeted a cornucopia of alluring, colorful icons, all vying for your attention. How many notifications have you missed? which platform should you browse while you’re waiting for the elevator?

Current interfaces are effective for presenting certain information to individuals. However, we encounter problems with these interfaces when we try to move accomplish tasks across applications and contexts.

Microsoft Hololens

Today, everything from consumer VR gaming to cutting-edge mixed reality systems still heavily rely on interfaces that use hardware controllers and skeumorphic UI. 

Initial Questions
What are the roadblocks to our transition towards more natural user interfaces? Why do we seem stuck to graphic user interfaces?  How might businesses and designers move towards new multi-modal human-machine interactions that enhance the creative potential of all humans? These are questions that I explored in the second year at Parsons. 


1 — Intro

2 — Research 
    —Diary study
    —User archetypes
    —Evolution of user interfaces
    —The computer’s POV
    —A gap in the market
3— Key Insights
4 — Massively Useful Gesture Systems
    —Key aspects
    —Design Framework

5 — Universal Gesture Language  
    —Interaction classification
    —A baseline
    —Visual communication

    —Gestr—Universal Gesture Controller
    —A live effects controller
    —Augmented reality dueling game



Research question

How might we facilitate the transition towards new modes of human-machine interactions through touchless gesture?


—Diary Studies
—Expert Interviews
—Desk Research

synthesized survey results

I intuited that what anchors us to the linear human-computer interactions lies in the the high-speed evolution of digital platforms and the advertising-dependent business models that they spawned, the increased accessibility of smartphones and the convenient growth that centralized mobile application marketplaces offers to tech startups.

My research began with by seeking to learn about how city-dwellers live with devices and machines in cities. What I learned was that most people have a complex relationship with their personal devices. While they enjoy their phones and laptops. They somehow feel constrained and their interactions feel inherently unnatural. And while those surveyed on average own between 3 to 10  internet connected devices, it should be noted that they feel over-reliant on their smartphones. It seems our current devices do not help very much in lessening our cognitive load. 

The way current interfaces are designed results in high short term end user computing satisfaction but the feedback loop is such that it is negative for long-term cognition.

We get the hit of dopamine from getting likes, but over time, we question the motives behind chasing those likes.

Diary Study

Over the semester-long study, I gave in to my addictive tendencies with regards to my iPhone. I logged an average of 1540 pickups and 54 hours of screen time per week. In my diary studies, I noted the correlation between the delightful-fulness of my interactions with machines with the ease with which I was able to navigate the functions of the device as well as the ease with which I was able to switch between applications and contexts. Above all else, smooth context transitions, even those between connected and analog machines were key to a pleasing experience.

Information filtered through smartphone

User Archetypes

Evolution of User Interfaces

Laying the groundwork for transition:
Imagine connecting iOS or Windows directly to your brain...

From the computer’s point of view

A gap in the market

based on general products in each category

3—Key Insights

Characteristics of
incumbent device ecosystems

  • Personal device dependent

  • Requires explanation through text

  • Intrinsically against biology (neck pain, carpal tunnel syndrome, strain on eyes)

What is holding back wide adoption of free-form gesture control products?


  • Limitations of multi-sensory feedback technologies

  • Business models that focus on high-end, niche products

  • Focus on new complex, immersive, single-user experiences rather than trying to eliminate the frustrations of current systems

Product Design / User Experience

  • Current products with free-form gesture control tie gestures to specific functions or require multiple components for interactions (joystick rings, wands, etc.)

  • No existing visual language that communicates information about gesture environments 

Mental Models

“Gesture control devices are expensive.”

“Free-form (contactless) gesture control is gimmicky”

Iceberg Model

New design frameworks that leverage interactions that are spatial and real-world based can lend us agency against addictive software.

4—Massively Useful Gesture Systems (M.U.G.S)

Below I have outlined a framework for the design of what I call Massively Useful Gesture Systems (MUGS).

Emphasis on the massively useful.

The aim of the second stage of my capstone project was the find ways through which we could deliver UIs that are directly connected to the user’s intent as to give the user agency that was not before possible.
Key aspects of MUGS


Context adaptiveness





possible components of a MUGS

Firstly, we have to assemble the necessary components for creating radically multi-modal HMI


Context adaptiveness

MUGS, as systems that move with and cohabits the same space as dynamic users, the components need to adapt to the various real-world environments and contexts in which they operate.


Fluid context transitions are key to maintaining smooth user flow.




Criteria for a universal baseline gesture language

  • No existing connotations

  • Culturally neutral 

  • Easily detectable by devices

  • Easily intelligible to other humans

Criteria for visual communication of gesture-intergrated interfaces

  • Needs to be legible in most environments and at different scales

  • Easily detectable by devices

“A better way to design the future things of everyday life is to use richer, more informative, less intrusive signals: natural signals. Use rich, complex, natural lights and sounds so that people can tell whether a sound is in front or behind, up or down, what the material and composition is of visible objects, whether an expected event is near in time or far, critical or not.”

Excerpt From
The Design of Future Things
Don Norman

Heuristics for the design of MUGS

The space is the map

The physical space the user inhabits is highly important for the success of the interaction. Consider how humans move within the space. How will the interface decontextualize the space, take advantages of its affordances, and transform with the objects within it?

Devices as nodes in a network

Take into account the multitude of devices that will can be collected to form spatial interfaces. What are their characteristics and limitations?

Variable feedback mechanisms

Consider how the spatial qualities of the interface will be communicated through feedback models. How will the user will be made continuously aware through multi-sensory feedback? In case the optimal feedback is not available, what are the alternatives? 

Scalable Input Resolution

Sensors have limits. How will your interface deal with the user’s movements in relation to sensors? How will your interface capture information from the user.

Vector of Intent

Spatial interfaces can be computationally intensive. To accommodate for the limitations of computer bandwidth, consider ways to track the user’s vector of intent through body orientation, facial tracking, etc. Consider the minimum amount of information necessary to convey information that will orient the interface to the user(s).


Gestural new user interfaces developed by Oblong Industries

“Japanese product designer Naoto Fukasawa has observed that the best designs are those that "dissolve in behavior," meaning that the products themselves disappear into whatever the user is doing. It's seemingly effortless (although certainly not for those creating this sort of frictionless system—intuitive, natural designs require significant effort) and a nearly subconscious act to use the product to accomplish what you want to do. This is the promise of interactive gestures in general: that we'll be able to empower the gestures that we already do and give them further influence and meaning.”

Exerpt from
Designing Gestural Interfaces
by Dan Saffer

5—Universal Gesture Langauge

I propose the design of a universal baseline language and a design framework for spatial interfaces that will allow designers to better envisioning future interfaces. This language will leverage the dexterity of the human anatomy for multi-level spatial interactions.

UX/UI designers need a language of gestures compatible with existing interfaces

Based on synthesis of the research, I found that the most frequent function for interacting with interfaces is navigation. In workshops designed to find the most intuitive mode of navigation, the participants would most often use pointing gestures to navigate the tests. Pointing is one of the most utilitarian and basic of gestures. Naturally then, the “point” gesture  can be the foundation on which to build a “universal” gesture language.

In an interview, John Underkoffler points out the power of human pointing, referring to pointing as the creation of a "magic vector mask" that is "high-density and both gestural and spatial.

Interaction classification

We can classify interactions with interfaces into three broad categories: In-application navigation, cross-application navigation and all other interactions such as data-input, and other interactions requiring higher input resolution.

A baseline 

A basic outline for the Universal Gesture Language (UGL)

Note: the model considers the feedback for each level of gestures.

Level 1—Pointing

The UGL is divided into 3 levels according to 3 main interaction categories.

The first level is appendage-agnostic and proportional.
If the distance between the two points in 1-B is x, then 1-C will always be >4x

Level 2 —Digits

Level 2 is where fingers come into play. The dexterity of the human hand is leveraged for more detailed control. 

Level 3—Glyphs

 Single finger glyph
Two finger glyph
 Three finger glyph

Level 3 consists of any continuous flowing motion drawn in space. This level would be useful for more complex interactions such as text input via something like a swipe keyboard or activating what would be akin to macro keys in new user interfaces. 

Two hands+
UGL Interfaces can be designed for two or more hands. 

Visual Communication

left: symbol that dilineate spaces that contain spatial interfaces

An aspect of MUGS is the communication between devices throughout a variety of real-world environments. The distributed nature of this system calls for a set of visual communication that informs the user about the interface through a mix of real-world and digital visual feedback.

Interactive space directly above 
Interactive space above and forward
Interactive space above, forward and behind


symbol connotes an interface that requires an activation gesture.

The design of the gesture for “activation” or “intent for engagement” with an interface is derived from a design process that involved examining the limitation of sensor tech while considering  accessibility for human users. 
The result is a gesture that involves holding two points apart, and moving them to complete a circle.

In multi-user scenarios in public spaces, an activation gesture can be useful for preventing unintentional interactions.  

Examples of how the “activation” gesture can be executed.

UGL_Lean Canvas
UGL_Value Proposition Map


Below are examples of how MUGS and UGL can be used for innovative products while acting as a potential catalyst for mass-adoption. 

A Universal Gesture Controller

The MUGS Framework and Universal Gesture Language can be applied towards mass-user applications


Value Proposition

Gestr is a spatial interface platform that allows seamless control of any connected device.

Learn more

A live effects controller

A system that allows for stage-performers to manipulate sound and visual effects through free-form gestures. 

Augmented reality dueling game

Users duel each other on smartphones by pointing the back-facing camera towards their opponent(s) and using controls based on UGL


As we become more immersed and connected to our technologies, we can envision a future scenario where we embody technology through our biology.


  • In the next five years, we will see the personal device ecosystems diversify beyond smartphones and smartwatches to smartglasses, gesture control rings, bands, haptic clothing and beyond. 

  • Proliferation of connected home devices.

  • Users will push for more integration of low-complexity niche wearable products (mixed reality visualization devices) to interface with their more powerful devices (smartphones, laptops).

  • Consumer sentiment will shift from "bigger phone is better" to The phone that can best interact across all my devices. Google and Amazon have a leg up in this regard. 

  • The big tech players will start to compete for mixed reality dominance.

  • Spatial interfaces will start to become more functional for everyday tasks.

  • The rise of spatial UI designers and gesture designers.

Lingering questions and thoughts

In a mass-adoption scenario, how will the intergration of a universal gesture language impact communication on a planetary level?

UGL-derivitives (off-shoot dialects) could transforms the way humans communicate offline as well. Through enscribing meaning to our body gestures as signal enhancement. 


Bainbridge, William (2004). Berkshire Encyclopedia of Human-computer Interaction. Berkshire Publishing Group LLC. p. 483.

Building a Gesture Recognition System using Deep Learning.

Clifton PG, Chang JS, Yeboah G, et al. Design of embodied interfaces for engaging spatial cognition. Cogn Res Princ Implic. 2016;1(1):24. doi:10.1186/s41235-016-0032-5

Deloitte: Americans Look at Their Smartphones More Than 12 Billion Times Daily, Even as Usage Habits Mature and Device Growth Plateaus.

John Underkoffler: Sci-Fi Interface Design in the Real World | MIND & MACHINE. Youtube

Saffer, D. (2009). Designing gestural interfaces:. O'Reilly Media.

Thorsten O Zander and Christian Kothe. Towards passive brain–computer interfaces: applying brain–computer interface technology to human–machine systems in general. (2011). IOP Publishing Ltd

Tomlinson, G. (2018). A million years of music: The emergence of human modernity. New York: Zone Books.

©2021 Kevin Lo.
All Rights Reserved.