Competing interest statement
Conflict of interest: the authors have no conflict of interest to declare.
Musical performance, like speech, is one of the most complicated tasks performed by people. Skilled musicians have spent thousands of hours rehearsing by the time they are 21 years of age,1 which is assumed to be the main reason for their increased auditory perceptual abilities as well as the structural and functional modifications noticed at the cortical and subcortical levels for music and speech.2-4
Weak signals embedded in modulated noise can be detected more efficiently than those embedded in unmodulated noise. It is known that an increase in noise bandwidth, a psychoacoustic phenomenon called comodulation masking release (CMR), facilitates signal detection in modulated noise.5 The detection threshold of a sinusoidal signal in an on-frequency masker can be improved by presenting further off-frequency maskers having the same envelope fluctuations across frequency bands.6 CMR can manifest itself through within-channel and across-channel mechanisms.7 The within-channel process is based on changes in the temporal envelope within a filter band, while the across-channel mechanisms are mediated by the temporal correlation between on-frequency and off-frequency filter bands. CMR can be calculated as the difference between the threshold of a sinusoidal signal with an unmodulated masker (UM) and its threshold with a comodulated masker (CM) at the same bandwidth.5 CMR for a masker bandwidth larger than the critical band signal will be greater than CMR for a bandwidth equal to the critical band. This difference is known as a true or across-channel CMR.6,8
The brainstem response to complex stimuli (cABR) can be used to evaluate the integrity of auditory functioning by showing evidence of the activity of the subcortical nuclei.9,10 The cABR shows the neural encoding of acoustical features of a stimulus with significant reliability11 Musicians have displayed enhanced subcortical encoding of stimulus features by giving faster responses than non-musicians.12 Moreover, the auditory brainstem responses are influenced by linguistical experience and are flexible.13 CMR is a complex task requiring auditory stream segregation. Formation of an auditory stream requires the fundamental ability to appropriately represent, group and store auditory units. It is considered to be an essential aspect of CMR. Researches have shown that listeners with greater attention and working memory will be least affected by masker and perform better on speech perception in noise skills.14,15 Because musicians can separate out the sounds of instruments in an orchestral performance (stream segregation), it has been hypothesized that a musician’s life experience of musical stream segregation may results in improved comodulation masking release. The current study surveyed psychoacoustical comodulation masking release in musicians and non-musicians, then recorded cABR in quiet and in comodulated and unmodulated maskers to investigate the effect of musical training on the neural representation of comodulation masking release. It was hypothesized that musicians have less-degraded brainstem responses in the presence of a comodulated masker than non-musicians.
Materials and Methods
The participants were 36 right-handed normal-hearing adults 18 to 30 years of age with no history of audiological, otological or neurological disorders. Their audiometric thresholds were 15 dB HL or better at octave intervals of 250 to 8000 Hz. The subject classified as musicians (N = 19) had 10 years or more of experience and started training before the age of seven. They had practiced at least four times weekly over the previous three years before registering for the study. Non-musicians (N = 17) were classified by the inability to meet the musician criteria.
Separate psychoacoustical and electrophysiological experiments were performed. All stimuli were processed in MATLAB R2014a. In the psychoacoustical experiment, the signal was a pure tone at 700 Hz, 300 ms in duration. The masker consisted of seven noise bands, one centered at a frequency of 700 Hz (on-frequency band) and the flanking bands were tones at 300, 400, 500, 900, 1000 and 1100 Hz, each with a bandwidth of 24 Hz and a duration of 1000 ms. Both on-frequency and flanking bands were 100% amplitude-modulated (AM) tones. In the electrophysiological experiment, the signal was the 40 ms speech syllable /da/ containing an initial stop consonant burst followed by a consonant-to-vowel transition synthesized using a Klatt synthesizer.16 For the masked conditions, a speech-shaped noise was presented in two modes with and without comodulation (CM, UM). In the comodulated condition, the maskers were modulated with a 100% amplitude modulation depth. The stimuli were converted from digital to analogue (44,100 Hz sampling rate; 16 bit) and equalized for RMS power. They were amplified using a programmable attenuator (TDT PA5) and a headphone buffer (TDT HB7). In both experiments, the signal and masking noise were presented to the right ear while the left ear remained silent.
The psychoacoustical experiment took place in a double-walled soundproof booth. The signal and masker were presented to the right ear of listeners through a TDH-39 headphone. The masker was presented at an intensity of 60 dB SPL for a duration of 1000 ms. The signal was presented in the last third of the masker. All tests were performed using a three-alternative forced choice procedure with adaptive signal-level adjustment. Each trial contained three intervals separated by gaps of 500 ms. The signal was presented to the subject in a randomly-chosen interval and the task was to indicate which interval contained the signal. The signal initiated at an intensity of 70 dB SPL. The signal intensity was adjusted according to the two-down, one-up procedure to estimate 70.7% of the psychometric point.17 The first step size was 4 dB, which was halved at every second reversal until the step size reached 1 dB. The process then continued for six reversals. In the next stage, the mean intensity level of the final six reversals was calculated and considered to be the estimated threshold. The final threshold was calculated using the mean of the three estimated thresholds.
The masked signal thresholds were obtained under three masking conditions. In the first condition, the masker consisted of two independent narrow bands of noise (on-frequency and flanking bands) without modulation and was denoted as the UM condition. In the CM condition, both the on-frequency band and flanking bands had the same envelope of square-wave modulation. In the reference (RF) condition, the threshold was measured for the modulated on-frequency masker alone. CMR was calculated by subtraction of CM from UM. The true or across-frequency CMR (AF CMR) was quantified by subtracting the threshold for the RF condition from the threshold obtained for the CM condition.
The cABR was recorded using a Biologic Navigator Pro system using BioMARK software (Natus Medical; USA) and all experiments were carried out in a double-walled, electrically and acoustically sealed sound booth. The subjects were seated in a comfortable chair. During the experiment, listeners could watch their favorite subtitled movies while keeping quiet and motionless.
Electrophysiological responses were recorded using a vertical array of three Ag-AgCl electrodes (vertex Cz active, high forehead ground, ipsilateral earlobe reference). Throughout data recording, electrode impedance was less than 5 kΩ and the inter-electrode impedance difference was below 3 kΩ. An online bandpass filter was employed at 100 to 2000 Hz. Online artifact rejection was used by a criterion of ±23 μv. The time window was 75 ms, including a 15 ms pre-stimulus period. Two blocks of 3000 artifact-free sweeps were collected and averaged. The recording sessions lasted for about 2 h. The electrophysiological responses to the syllable /da/ were recorded in quiet and masking conditions (CM, UM). It was presented to the right ear of listeners through an insert earphone (ER-3; Etymotic Research) at 80 dB SPL at a rate of 10.9 Hz with alternating polarity. The stimulus was introduced at signal-to-noise ratios of +10 dB. The signal-to-noise ratio was selected based on pilot tests. Both the stimulus and maskers were presented through an earphone inserted into the right ear while the left ear was kept in silence. The experiment and data collection were completed without interruption in one session. The latency and amplitude of the onset peak: V: onset trough: A; transient peak: C; FFR peaks: D, E, and F; offset peak: O of the cABR were analyzed off-line.
Statistical analysis was conducted in SPSS v. 18.0 software. The Kolmogorov-Smirnov test indicated that all data followed a normal distribution (P<0.05). No statistically significant differences were found between groups for age and pure-tone audiometry. Figure 1 shows the average psychoacoustic thresholds of the signal for the three conditions (CM, UM and RF) in both groups. The musicians demonstrated lower thresholds than non-musicians for all conditions. An independent-samples t-test was conducted to compare the thresholds of the signals in musicians and non-musicians for the three conditions. There are significant differences in the masked thresholds of signals in musicians (CM condition: M = 42.72, SD = 1.49; UM condition: M = 55.89, SD = 1.16; RF condition: M = 46.96, SD = 2.02) and non-musicians (CM condition: M = 47.50, SD = 2.32; UM condition: M = 58.14, SD = 2.32; RF condition: M = 50.10, SD = 2.24); CM condition: t(34) = 7.41 (P<0.05); UM condition: t(34) = 3.72 (P<0.05); RF condition: t(34) = 4.40 (P<0.05). These results demonstrate the positive effects of musical training on detection of signal in noise.
The average CMR and AF-CMR of both groups are shown in Figure 2. The musicians demonstrated higher CMRs than non-musicians (musicians: M = 13.16, SD = 1.85; non-musicians: M = 10.49, SD = 3.25); t(34) = 3.06 (P<0.05). The average AF-CMR was significantly greater for the musicians group (M = 4.23, SD = 2.25) than the non-musicians group (M = 2.50, SD = 1.03); t(34) = 2.89 (P<0.05). The musicians demonstrated greater comodulated release from masking than the non-musicians.
Figure 3 is a time-domain representation of grand average brainstem responses to the syllable /da/ from –15 to 60 ms in quiet and in two comodulated and un-modulated masking conditions (CM, UM) for a signal-to-noise ratio of +10 dB. The effect of the masking conditions on the brainstem responses was examined using a one-way analysis of variance (ANOVA). For all peaks of both groups, the ANOVA revealed a significant reduction in amplitude and a significant increase in latency for two masking conditions (P<0.05), but these changes were significantly greater for the UM than CM condition. The peak latencies and amplitudes of responses for the musician and non-musician groups were compared for response peaks V, A, C, D, E, and O. For all peaks, the independent-samples t-tests revealed that the two groups had nearly the same average peak latencies and amplitudes in quiet and UM masking condition (P>0.05). The musicians showed significantly greater amplitudes and earlier response timing than non-musicians for the syllable /da/ in the presence of comodulated masker (Table 1). Musicians showed greater comodulated release from masking than non-musicians in agreement with results of the psychoacoustical experiment.
In accordance with the hypothesis, the performance of the musicians was better on CMR and the brainstem correlates of CMR. In previous studies, better speech perception in the presence of noise was observed in musicians for prosody,18 melody,19 pitch,20 temporal component and speech discrimination.21,22 These results suggest that musical training enhanced the neural processing of speech. Moreover, musicians showed enhanced attention and working memory.23,24 The present data show that a musician’s life experience of musical stream segregation results in improved comodulation-masking release. Because perceptual cues are significant for segregate the target signal from background noise, those who listen with enhanced auditory perceptual skills can identify fine acoustical signals and show improved ability to auditory stream segregation.
The Hebbian principal states that the relations between neurons are simultaneously active and do not weaken over time.25 This can be one explanation for the increase in the abilities of musicians. It is possible that extended music practice will improve neural connections. The results indicate that the Hebbian principle can apply for learning at lower stages (bottom-up) and also be required at higher stages (top-down). The nervous system extracts the relevant signal and suppresses irrelevant noise in bottom-up and top-down processing.26 Bottom-up and top-down interactive processing results in subcortical plasticity with musical training.
The outcomes of the present study indicate that musicians have the potential to benefit from brief temporal minima in modulated background noise to catch signal cues (also known as listening-in-the-dips). The masking release decreases for maskers with flat temporal and spectrally steady-state noise and is likely to mask the weaker portions of the signal.27 Spectral and temporal resolutions are required for listening-in-the-dips. The results indicate that lifelong experience with stream segregation improves neural signal encoding and enhances representation of the speech signal in comodulated noise. It can be concluded that musical experience is an advantage for the dip-listening mechanism.
The results indicate that the average true or AF-CMR is significantly greater for the musicians group than the non-musicians group. Across-frequency processing can also explain enhanced CMR in the musician group. CMR is a complex task requiring across frequency processing.7 Across-frequency modulation can collaborate with other activities and result in auditory grouping, stream segregation and auditory object formation. The present data show that a grouping mechanism and masking release are associated with one another. These results indicate that musicians performed better on across-frequency modulation processing, auditory grouping and stream segregation.
The data from the current study indicates that musical experience is an advantage for comodulation masking release. Musicians had enhanced subcortical representations of the syllable /da/ in comodulated maskers. Thus, musicians demonstrate improved neural synchrony and less-degraded brainstem responses for comodulated masker than non-musicians. In agreement with the results of the psychoacoustical experiment, musicians showed greater comodulated release from masking than non-musicians. It can be concluded that musical experience is an advantage for the dip-listening mechanism and across-frequency processing. These results suggest a physiological explanation for psychoacoustical enhancement in musicians for comodulation masking release.