Reproducing Reality: Perception and Quality in Immersive Audiovisual Environments

Abstract

The research in this thesis may be classified into two inter-related categories: understanding human perceptual information processing under natural audiovisual conditions and, based on this knowledge, evaluating the quality of experience and content perception in immersive audiovisual reproductions. The main concept guiding this work is the observation that there is no need to simulate the whole world if it cannot be perceived. The three main findings in the first category are: 1) The most important perceptual attributes in natural scenes depicting urban environments were found to be the amount of movement, perceived noisiness, and openness of the scene. 2) Movement and openness were found to be mainly visual attributes. In some scenes, the auditory system was able to derive information about movement and openness that was comparable with audiovisual conditions already after 500 ms stimulation. Noisiness was dominantly auditory, but visual information was found to be an aiding factor. Cross-modality effects affecting global estimates of the scene attributes were found in movement and openness. 3) Task-relevant auditory cues were found to aid in orienting to and detecting a peripheral but not a central visual target. Significant improvements were found with a 1000 ms audio lead compared to synchronous onset, and with a 500 ms audio lead compared to no sound condition. The main findings in the second category are: 1) The spatial extent of the reproduction setup affects the perception of natural scene attributes especially in movement, where discrimination accuracy decreases with larger reproduction extent, and in noisiness, where the discrimination accuracy increases. 2) With full video width the effect of the spatial width of audio on quality of experience is the strongest, but as the video width is reduced, the effect of audio width almost disappears. 3) Immersive 3D sound is able to hinder the perception of visual events in video reproduction by dispersing visual attention. 4) In 2D video combined with 3D spatial sound the perceived spatial discrepancy between the auditory and visual events may become annoying if the observer is seated close to the screen and off-axis.

Type
Publication
Doctoral Thesis, Aalto University