Does amplitude envelope play a role in the perception of audio-visual event congruency?

Our perceptual system must constantly work to identify sights and sounds originating from common sources.  Although  coincidence in time and space are typically thought to drive decisions regarding audio-visual integration,  Vatakis & Spence (2007) demonstrated greater binding for gender matched vs. mis-matched faces and voices.  However, subsequent research on non-speech stimuli found no support for unity assumption when using musical notes played on the piano and guitar.  That suggested that this important process may be unique to speech (Vatakis & Spence, 2008).  

Intrigued by the use of cues beyond time/space in assessing multi-modal congruency, we noticed that the guitar and piano notes used in previous studies exhibit similar amplitude envelopes.  This raises intriguing questions about amplitude envelope’s role in triggering the unity assumption, which would broaden our understanding of multi-modal integration.  To explore the issue, we recorded videos of instruments producing notes with clearly differentiated amplitude envelopes:

Video 1:

Cello SOA 200 VFIRST from MAPLE Lab on Vimeo.

Video 2:

MarVid CelAud SOA 200 AFIRST from MAPLE Lab on Vimeo.

The audiovisual time lag in both these videos is 200 ms, but you may have found the lag more difficult to detect in Video 1. This demonstrates the power of the unity assumption – understanding that sight and sound originate from a common event. Our findings now appear in Attention, Perception, & Psychophysics (Chuen & Schutz, 2016),  available from our publications page.