Apr.–June 2014 (13, 02) pp. 84-87
1536-1268/14/$31.00 © 2014 IEEE

Published by the IEEE Computer Society
Behavioral Imaging and Autism
James M. Rehg, Georgia Institute of Technology

Agata Rozga, Georgia Institute of Technology

Gregory D. Abowd, Georgia Institute of Technology

Matthew S. Goodwin, Northeastern University
  Article Contents  
  Studying Human Behavior  
  Advancing Behavioral Imaging  
  Measuring Eye Contact  
Download Citation
Download Content
PDFs Require Adobe Acrobat
Behavioral imaging encompasses the use of computational sensing and modeling techniques to measure and analyze human behavior. This article discusses a research program focused on the study of dyadic social interactions between children and their caregivers and peers. The study has resulted in a dataset containing semi-structured play interactions between children and adults. Behavioral imaging could broadly affect the quality of care for individuals with a developmental or behavioral disorder.
Beginning in infancy, individuals acquire social and communication skills that are vital for establishing the social relations needed for a healthy and productive life. Children with developmental delays (such as autism spectrum disorders) face great challenges in acquiring these skills, resulting in substantial lifetime risks. Many developmental delays don't have a clear genetic basis and can only be detected through the measurement and analysis of a child's behavior.
In the US, 50 percent of children with developmental disabilities lose an important window for early intervention because their condition isn't identified until they start school. 1 Even in cases where a disorder can be identified through genetic testing, as with Down syndrome, intervention and monitoring of an affected individual depends entirely on behavioral observation. Current methods for acquiring social and communication behavioral data are so labor-intensive as to preclude large-scale screening and early interventions, resulting in substantial disparities in outcomes.
Computational sensing and modeling could play a key role in transforming the measurement, analysis, and understanding of human behavior. We refer to this area of research and technology development as behavioral imaging, an analogy to the medical imaging technologies that revolutionized internal medicine in the 20th century. A similar opportunity exists to create new capabilities for the quantitative understanding of human behavior and development.
Studying Human Behavior
Computing can impact the study of human behavior in at least three different ways.
Capturing Multimodal Portraits of Behavior
First, the widespread availability and increasingly low cost of sensor technology makes it possible to capture a multimodal portrait of behavior through video, audio, and wearable sensing. Modalities such as video, audio, and accelerometry make it possible to sense overt behavioral cues—such as facial expressions, vocalizations, and gestures—which are used by a trained clinician.
In addition, on-body sensors can provide measurements of physiological processes—such as electrodermal, cardiovascular, and respiratory activity—which both influence behavior and are influenced by it. Physiological sensors provide an additional covert window into behavior that's not typically available to a clinician directly.
Detecting and Measuring Behavior
Second, given access to dense, continuous, multimodal sensor streams, advanced machine-learning and data-mining methods can be used to detect the occurrence of behavioral events and measure their attributes. For example, the ability to detect the utterance of a child's name and track the subsequent rotation of the child's head toward the sound makes it possible to identify whether the child responded to his or her name, and to describe this response computationally via attributes such as latency and trajectory.
By exploiting dense sensor data, it might be possible to develop novel computational models of behavior. This endeavor connects to a long history of activity recognition and behavior modeling in a diverse set of fields such as pervasive computing and computer vision. However, the study of dyadic social behaviors raises a unique set of challenges that haven't been addressed in previous research.
Observing Behavior in Natural Environments
Third, recent development of wearable, on-body sensing and computing, as exemplified by platforms such as LENA 2 and Google Glass, create opportunities to measure behavior under natural conditions (outside the laboratory or clinic) and can even provide a means for real-time feedback via a heads-up display or smartphone. For example, a therapist engaged in social-skills training could be provided with a record of the number and quality of eye-contact events during an interaction with a child, which could support a more accurate assessment of the intervention response and permit more fine-grained tuning of the intervention.
Advancing Behavioral Imaging
Behavioral imaging can thus broadly affect the quality of care for individuals with a developmental or behavioral disorder in two key ways. First, it can provide efficient and objective measurements of behavior without the need for labor-intensive human observation and coding. Second, it can provide both affected individuals and caregivers with real-time social feedback that can help to improve the quality of an interaction—for example, in the context of assessment or therapy.
As part of our on-going work to create behavioral imaging technology, we recently released a dataset to catalyze greater interest and participation in this area within the computer science and engineering communities.
The Multimodal Dyadic Behavior Dataset
The problem of analyzing dyadic social interactions arises naturally in the diagnosis of and interventions for developmental and behavioral disorders. For example, research using video-based microcoding of young children engaged in social interactions has revealed behavioral “red flags” for autism in the first two years of life, specifically in the areas of social, communication, and play skills. Currently, such careful measurement of behavior isn't possible (or practical) in the real-world settings of a pediatric office or daycare classroom. There is much potential for behavioral imaging technology to scale early screening and intervention efforts by bringing reliable, rich measurement of child behavior to real-word settings. The advent of dense measurements of behavior in conjunction with the development of computational models has the potential to transform the study of behavior and development and create a new discipline, which we refer to as computational behavioral science.
An understanding of typical child development is crucial to identifying patterns of deviation from typical behavior that characterize autism. To enable this characterization, we assembled a dataset containing semi-structured play interactions between a child and an adult. Each interaction follows a protocol known as the Rapid-ABC, which specifies a brief (3 to 5 minute) interactive assessment. We recorded and annotated more than 160 Rapid-ABC sessions using multiple time-synchronized sensing modalities (video, audio, and physiological). We recently introduced the Multimodal Dyadic Behavior (MMDB) dataset, which contains this sensor data along with human-coded annotations and some preliminary analysis results. 3 Instructions for obtaining the MMDB dataset can be found at www.cbi.gatech.edu/mmdb.
The MMDB dataset was collected in the Child Study Lab (CSL) at Georgia Tech, under a university-approved Institutional Review Board protocol. The CSL is a child-friendly 300-square foot laboratory space equipped with a variety of unobtrusive sensing capabilities, including two Basler cameras (1920 × 1080 at 60 frames per second), a Kinect (RGB-D camera) mounted in the ceiling, dual lavalier wireless lapel microphones worn by child and adult, and four wireless sensors (one worn on each wrist of the child and adult) for recording electrodermal activity (in order to continuously measure sympathetic nervous system arousal) and three-axis accelerometry.
The Rapid-ABC protocol is a brief (three-to-five minute) interactive assessment, in which a trained examiner elicits social attention, back-and-forth interaction, and nonverbal communication from a child. These behaviors reflect key socio-communicative milestones in the first two years of life, and their diminished occurrence and qualitative difference in expression have been found to represent early markers of autism spectrum disorders. During these standardized play interactions, a child sits in a parent's lap across a small table from an adult examiner. The examiner engages the child in a series of five activities and records the child's behavior on an associated scoring sheet. The score sheet captures the presence of specific discrete behaviors, such as the child making eye contact, along with a rating of engagement using a three-point scale.
In addition to the sensor data and the score sheet for each session, the MMDB dataset also includes frame-level, continuous annotations of relevant child behaviors. These annotations were produced by research assistants who were trained to reliably code behaviors. These additional annotations include precise onsets and offsets of a child's attention (for example, was the gaze directed at the examiner's or parent's face, a ball, or a book?), vocalizations and verbalizations (words and phrases), vocal affect (laughing and crying), and communicative gestures (such as pointing, reaching, waving, or clapping).
Data Sharing
A key aspect of our recruitment process was a multilevel consent form that gave parents the option of sharing their children's data with the broad research community, defined as “any researcher at an accredited institution that implements a process for human subject research.” One of our positive findings was a high level of support among our parents for such broad sharing of research data. Specifically, the parents of 121 children consented to the broadest level of sharing, and 43 of these children completed a second session a few months later. This is significant, given that one of the biggest barriers to broad-level participation by computer scientists in behavioral imaging research is the general lack of publically available datasets of well-structured and annotated human behavior.
In this context, it's worth mentioning the NSF/NIH-funded Databrary project ( http://databrary.org), which has a similar goal of creating freely available datasets for research in developmental science. In our case, the funding for the MMDB collection was provided by an NSF award to Georgia Tech under the Expeditions in Computing program (see www.cbs.gatech.edu for details; funded institutions are Boston University, Carnegie Mellon University, Georgia Tech, MIT, Northeastern University, University of Illinois at Urbana-Champaign, and University of Southern California).
Data Analysis
To date, we have explored the automatic analysis of three aspects of the MMDB dataset: parsing each session into discrete stages, detecting discrete behaviors (gaze shifts, smiling, and play gestures), and predicting test-administrator engagement ratings. These analyses and results are described in more detail elsewhere 3 and are available as part of the dataset release.
The analysis of social interactions in the MMDB dataset introduces several challenges that don't commonly arise in existing datasets. First, the dyadic nature of the interaction makes it necessary to explicitly model the interplay between participating agents. This requires an analysis of the timing between measurement streams, along with their contents. Second, social behavior is inherently multimodal and requires the integration of video, audio, and physiological modalities to achieve a complete portrait of behavior. Third, social interactions are often defined by their strength of engagement and the reciprocity between participants, not solely by the performance of a particular task. Moreover, these activities are often only loosely structured and can occur over an extended duration of time.
More broadly, the analysis of adult-child interactions in the context of assessment and therapy provides a unique opportunity for behavioral scientists and computer scientists to work together to address basic questions about the early development of young children. For example, detecting whether a child's gestures, affective expressions, and vocalizations are coordinated with gazing at the adult's face is critical in identifying whether the child's behaviors are socially directed and intentional.
Another important challenge is to identify the function of a child's communicative acts when directed to a partner. When a child is using vocalizations or gestures, is the child's intention to request that the partner hand over an object or perform an action; to direct the partner's attention to an interesting object; or simply to maintain an ongoing social interaction? Answering these questions in a data-driven manner will require new approaches to assessing and modeling behavior from video and other modalities.
Measuring Eye Contact
The MMDB dataset is a resource that supports the development of a broad range of behavioral analysis techniques and enables the study of dyadic social interactions in a controlled environment under a consistent protocol. We have also begun to investigate wearable sensor solutions that can provide greater flexibility and operate under more naturalistic conditions. Here we describe the use of a wearable camera system, such as Google Glass, to detect moments of eye contact.
Eye contact is an important aspect of face-to-face social interactions, and measurements of eye contacts are important in a variety of contexts. In particular, atypical patterns of gaze and eye contact have been identified as potential early signs of autism, and they remain important behaviors to measure when tracking young children's social development. In spite of the developmental importance of gaze behavior in general, and eye contact in particular, there currently exist no good methods for collecting large-scale measurements of these behaviors.
Classical methods for gaze studies require either labor-intensive manual annotation of gaze behavior from video, or the use of screen-based eye-tracking technologies, which require a child to sit still and examine content on a monitor screen. More recently, companies such as Positive Science have developed wearable gaze-tracking systems that can be worn by children. However, these systems are difficult to deploy experimentally, because they require a child to wear a camera on his or her face below the eye (while tethered to a capture system), and they require calibration processes that are challenging and labor-intensive.
Our innovative approach leverages the existence of wearable camera systems for adults, manufactured by companies such as PivotHead and Looxcie, which have unobtrusive form factors and can record video for several hours on a single charge. In the case of PivotHead, the point-of-view (POV) camera is located above the bridge of the wearer's nose, in the center of a pair of glasses. During face-to-face interaction, when a child is making eye contact with an adult (wearing the PivotHead glasses), the child's face will be naturally oriented to face the POV camera, and the child's gaze will be directed approximately along the camera's optical axis. In this configuration, it's possible to estimate the child's gaze direction by analyzing the pattern of pixels in the image of his or her eye.
We use the OKAO Vision system from Omron to detect the child's face in the POV camera video and then analyze the eye regions to predict the gaze direction. We trained a random forest classifier on a training dataset containing hand-coded ground truth eye contact labels. The classifier analyzes the output of the video analysis to determine if and when eye contact has occurred. This process is illustrated in Figure 1 and described in more detail elsewhere. 4
Examiner-child interaction: (a) the examiner is wearing point-of-view glasses, while the child is unencumbered. (b) the results of eye-contact detection are denoted by the red box around the child's face.

Figure 1.Examiner-child interaction: (a) the examiner is wearing point-of-view glasses, while the child is unencumbered. (b) the results of eye-contact detection are denoted by the red box around the child's face.

Behavioral imaging is a new research area with connections to a broad range of disciplines, including computer vision, audio and speech analysis, wearable sensing, ambulatory psychophysiology, and pervasive computing. It encompasses the use of computational sensing and modeling techniques to measure and analyze human behavior. Here, we've introduced our research program focused on the study of dyadic social interactions between children and their caregivers and peers. We believe this approach can lead to novel, technology-based methods for screening children for developmental conditions such as autism, techniques for monitoring changes in behavior and assessing the effectiveness of intervention, and the real-time measurement of health-related behaviors from on-body sensors that can enable just-in-time interventions.
1. J. Corderoet al., “CDC/ICDL Collaboration Report on a Framework for Early Identification and Preventative Intervention of Emotional and Developmental Challenges,” Interdisciplinary Council on Developmental Learning Disorders, 11Nov.2006.
2. D.K. Olleret al., “Automated Vocal Analysis of Naturalistic Recordings from Children with Autism, Language Delay, and Typical Development,” Proc. National Academy of Science, vol. 107, no. 30, 2010, pp. 13354–13359; doi: 10.1073/pnas.1003882107.
3. J.M. Rehget al., “Decoding Children's Social Behavior,” Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR 13), June2013, pp. 3414–3421; doi: 10.1109/CVPR.2013.438.
4. Z. Yeet al., “Detecting Eye Contact Using Wearable Eye-Tracking Glasses,” Proc. 2nd Int'l Workshop Pervasive Eye Tracking and Mobile Eye-Based Interaction (PETMEI 12, held in conjunction with Ubicomp 12), 2012, pp. 699–704, 2012.

James M. Rehg is a professor in the College of Computing at the Georgia Institute of Technology. Contact him at rehg@gatech.edu.

Agata Rozga is a research scientist in the College of Computing at the Georgia Institute of Technology. Contact her at agata@gatech.edu.

Gregory D. Abowd is a Regent's Professor and a Distinguished Professor in the College of Computing at the Georgia Institute of Technology. Contact him at abowd@gatech.edu.

Matthew S. Goodwin is an assistant professor in the Department of Health Sciences and the College of Computer and Information Sciences at Northeastern University. Contact him at m.goodwin@neu.edu.
[%= name %]
[%= createDate %]
[%= comment %]
Share this:
Please login to enter a comment:

Computing Now Blogs
Business Intelligence
by Ray Major
Cloud Computing
A Cloud Blog: by Irena Bojanova
Enterprise Solutions
Enterprise Thinking: by Josh Greenbaum
Healthcare Technologies
The Doctor Is In: Dr. Keith W. Vrbicky
Hot Topics
NealNotes: by Neal Leavitt
Industry Trends
Mobile Computing
Shay Going Mobile: by Shay Shmeltzer
NGN-Insights: by Martin Nuss and Uday Mudoi
No Batteries Required: by Ray Kahn
Software Technologies: by Christof Ebert