Multimodal perception strives to integrate information from multiple sensorial channels into a unified experience, that contains more information than just the sum of the separate unimodal percepts. As a result, traditional quality metrics for unimodal services cannot reflect the perceived quality in multimodal situations, and new quality estimation methods are needed. In this work, audiovisual perception was studied with an immersive audiovisual display. The audiovisual display consisted of a video screen with field of view of 226 and 3D sound reproduction with 20 loudspeakers. The aim of the study was to observe the crossmodal interaction of auditory and visual modalities, when the spatial widths of audio and video reproduction were limited. A subjective study was organized, where the overall perceived degradation of the stimuli was evaluated with Degradation Category Rating in four different types of audiovisual content. In addition, free descriptions of the most prominent degrading factors were collected. The participants’ individual tendencies to experience immersion were screened prior to the experiment with a questionnaire. The results show that video width is the dominant element in defining the degradation of a stimulus. Also audio width had an impact when the video width was at maximum. Individual tendency to experience immersion was not found to have significant impact on perceived degradation in this study. Slight content effects were observed. Constrained correspondence analysis of the free description data suggests the reasons for highest perceived degradation to be caused by wrong audio direction, reduced video width and missing essential content.