Hearing impaired people and cochlear implant (CI) users in particular often experience difficulties regarding speech intelligibility and acoustic annoyance in noisy environments.1 There are many types of environmental noise which differ in their temporal and spectral structure. For example, the envelope and spectral content of stationary noise, such as the sound of a fan running at constant speed, remain virtually constant over time. Transient noise, conversely, is characterized by rapid level fluctuations, like for example the clattering of dishes, door slams or typing on a keyboard. Transients are defined by an impulsive onset rise time (down to fractions of a millisecond) followed by a fast decay (tens of milliseconds) and a total duration of less than one second.2,3 Due to these properties transients are not treated appropriately by conventional single-channel noise reduction algorithms, which are designed to reduce stationary or slowly varying noises.4 A different class of algorithms are beam formers or spatial filtering algorithms. These algorithms do not make assumptions regarding the stationarity of the noise and they are only effective when signal and noise arrive from sufficiently distinct angles (whereby the signal is typically assumed to arrive from the front).5 Hence, in order to treat transients in audio signals too, several hearing aid companies have developed dedicated transient noise reduction (TNR) algorithms.6,7 Different studies with hearing aid users have yielded somewhat inconclusive outcomes regarding their effectiveness.4,8-10
Nevertheless, to find out the effect of TNR on CI users, we have previously tested one such algorithm provided by hearing aid manufacturer Phonak.11 It detects noise transients whenever the signal envelope exceeds a certain slope, sound pressure level and magnitude. Upon detection, the affected segment is then reduced in level by a short-lasting broadband attenuation with instantaneous onset and fast release. To evaluate the suitability of this algorithm, we conducted a range of tests including speech intelligibility in transient noise and quiet as well as subjective ratings of speech clarity, comfort and overall preference. The noises used in that study were hammer blows (repeated at a rate of 4 Hz) and dishes clattering. Speech intelligibility improvements of 1.7 dB in hammer blows and 0.4 dB in dishes clattering were observed as well as better speech clarity in hammer blows. Regarding comfort and preference ratings, no significant differences were found. Since CI users require higher SNRs than normal hearing listeners we sought to further improve the algorithm performance specifically under such conditions. The following study investigates the effect of a novel TNR algorithm in CI users on speech intelligibility and subjective perception in more realistic noise and more types of noise. All experiments were conducted in compliance with the declaration of Helsinki and approved by Hannover Medical School Ethics Commission (permission number 6569).
Materials and Methods
Description of the multi-band transient noise reduction algorithm
For our present study, we sought to extend the previous single-band algorithm so as to improve its efficacy, particularly with regards to cases were speech and transient noise occur at the same time. Based on a comparison of the long-term spectra of speech and several exemplary real-life transient noises (Figure 1),12 we hypothesized that a limiting factor in the general design of the original algorithm may have been the broadband nature of both transient detection and gain application. Overall, the long-term spectra of many transient noises are wide-band than speech and contain dominant peaks above 1 kHz whereas the long-term speech spectrum does not. An ongoing speech signal may hence mask a high-frequency transient noise in the envelope of the time-domain signal, making the detection of the latter impossible. Upon detection, conversely, there is no need to apply attenuation to parts of the spectrum that do not contain noise energy, or where the noise is energetically masked by concurrent components of speech (at the risk of creating audible speech distortions). Together, these considerations suggest that a multi-band approach (where detection and gain application are performed independently in different frequency regions) may be advantageous for improving sensitivity while limiting possible side effects.
We extended the existing algorithm by first splitting the input audio signal into four separate frequency bands before applying the original transient detection mechanism independently in each band. Cutoff frequencies and corresponding bandwidth of frequency bands is: 0-1 kHz, 1-1.9 kHz, 1.9-6.6 kHz and 6.6-9.4 kHz. The lowest band was chosen so as to capture the highest-energy, low-frequency components of speech including the fundamental frequency and first formant region.13 The second band was chosen to approximately cover the range of second formant frequencies. The third band includes high-frequency speech components such as third formants and high-frequency consonants. The fourth band was chosen to contain only the highest frequencies, which carry little speech-relevant information.
Transient detection in each band was carried out as in the original single-band algorithm. The attenuations consequently applied were proportional to the band-specific transient amplitudes. These were continuously estimated with a method similar to that proposed by Hirszhorn et al. and limited to a maximum between 10 and 30 dB.14
15 experienced CI users implanted with an Advanced Bionics HiRes90k or CII implant participated in this study. The average age was 61 years (ranging from 49 to 72 years) with a wearing experience of 6 years on average (1 to 13 years). All participants were using a Harmony or Naida CI behind-the-ear speech processor in their everyday life running the coding strategy HiRes F120 or HiRes Optima.15 12 of them had at least one clinical program with the stationary noise reduction algorithm ClearVoice.1
The algorithm was implemented on a laptop that was used to generate all sound tokens presented during testing. The tokens were delivered to a speech processor via an external soundcard (Roland Cakewalk UA-1G) connected to the auxiliary audio input jack of a Harmony speech processor with a Direct Connect ear hook and cable.
The multi-band algorithm (TNRmult) and the previously tested single-band version (TNRsgl) were compared to the reference condition without transient noise reduction (TNRoff; Figure 2). The testing was conducted unilaterally whereby bilaterally implanted participants were tested on the better ear. The contralateral ear was obstructed with an earplug if the participant had residual hearing. A Harmony speech processor was programmed with the participant’s everyday program. ClearVoice was always deactivated and all participants used the clinical standard setting of the automatic gain control (AGC) circuit.16
Test procedure and noises
The subjects performed tests assessing speech intelligibility in transient noise and soft stationary noise, subjective sound quality ratings and potentially harmful distortions of warning signals over the course of two acute appointments (Table 1). All tests were single-blinded except for the warning signal test, where TNRoff was clearly labeled as reference owing to the high variability amongst different exemplars of the same type of signal in the real world. Test lists and processing conditions were randomized within appointments for the speech tests and the subjective ratings over all participants.
The speech tests in transient noise were performed with the Oldenburg sentence test (OLSA) using two different types of transient noises, cafeteria and office, both presented at a fixed noise level of approximately 80 dB (Table 2).17 The speech level varied adaptively according to each participant’s performance, yielding the 50% speech reception threshold (SRT), i.e., the SNR at which the subject can understand 50% of the words. Noise was presented intermittently, preceding the target onset by 1 s and following the target offset by 0.5 s. This measure ensured that the gain provided by the AGC had settled before the speech onset.
The speech test in soft stationary noise was conducted to assess the impact of the algorithm on speech in an almost quiet environment. Applied was the Hochmair-Schulz-Moser-sentence test (HSM) as one of the standard non-adaptive tests for CI users at Hannover Medical School.18 HSM sentences were presented at a fixed speech level of 65 dB in a background of 50 dB continuous HSM-speech-shaped noise, yielding the percentage of correctly identified words as a result. The background noise was added to avoid ceiling effects, which were observed in previous study in the same test in quiet. On the basis of high SNR and stationary character of the background noise this measure should have little to no effect on the behavior of the algorithm compared to speech in quiet. Subjective sound quality ratings included three types of assessment: speech clarity in noise, comfort and overall preference. Speech clarity and comfort were rated by presenting two randomly chosen OLSA sentences joined together and mixed with cafeteria or office noise (Table 1). As in the speech test, the noise preceded and followed the sentences by 1 s and 0.5 s, respectively. Speech levels were set to 3 dB above each participant’s individually determined SRT from the earlier speech test to ensure an appropriate audibility and intelligibility. Comfort was further tested on a number of noises presented without speech in a background of 55 dB continuous speech-shaped noise in order to allow the AGC settle to a moderate background level (Table 2). Participants used a touchscreen interface to play back the three processing variants (TNRoff, TNRsgl, TNRmult) of a particular sound token (repeatedly and in arbitrary sequence) before rating them regarding both speech clarity and noise comfort. The rating scale for both criteria ranged from 0 to 100, subdivided into the following five labeled categories: 0-20 bad, 21-40 poor, 41-60 fair, 61-80 good and 81-100 excellent. After performing the ratings for a given sound token, participants had to specify an overall preference order of the three processing conditions (shared ranks permitted) before moving on to the next trial.
The warning signal test was used to assess the potential impact of TNRmult on the perception of common alarm sounds. In particular, participants were asked whether the processing with TNRmult caused audible distortions or level difference compared to the reference condition TNRoff for a number of exemplary signals (Table 1). For loudness, participants could rate TNRmult as being louder, equally loud or softer than TNRoff. In terms of distortion, possible ratings were: virtually identical, recognizable but distorted and unrecognizable.
As most of the collected measurements (or their paired differences) were not normally distributed (Shapiro-Wilk-Test, P<0.05), non-parametric tests were subsequently applied for further statistical analysis. Unless stated otherwise, Friedman’s test was used as family-wide test for differences in median amongst a given set of repeated measures, followed by Conover’s post-hoc test wherever the Friedman test yielded a positive outcome (α=0.05 for both tests).19 No formal statistical analysis was performed on the answers collected regarding distortion and loudness of warning signals.
Speech intelligibility in transient noise
Friedman’s test showed an effect (P<0.0004) of processing conditions on speech intelligibility in cafeteria noise (Figure 3A). Conover’s post-hoc test revealed an improvement with TNRmult over both TNRoff and TNRsgl of 2.4 dB and 2.9 dB respectively (P<0.05). SRTs with TNRoff were better than with TNRsgl by 0.5 dB. For the speech tests in office noise, Friedman’s test also revealed significant differences in medians (P<0.002; Figure 3B). Here, TNRmult performed better than TNRoff and TNRsgl by 1.5 dB and 3.0 dB respectively (P<0.05). However, no difference was observed between TNRoff and TNRsgl.
Speech intelligibility in soft stationary noise
With only two conditions to compare, Wilcoxon’s sign test was used on the HSM word scores obtained in continuous speech-shaped noise. No difference between TNRmult and TNRoff was found (P>0.108; Figure 3D) with ceiling performance levels in both conditions (99.1% median each).
Subjective ratings of comfort in cafeteria noise were influenced by processing (P<0.002; Figure 4A). Post-hoc tests showed a significant improvement with 62 points for TNRmult over TNRoff and TNRsgl with 40 and 50 points respectively (P<0.05). Friedman’s test also revealed an effect for aluminum crackling (P<0.017), where the score of 63 with TNRmult was higher than for TNRoff and TNRsgl with 50 points each (P<0.05). No effects were found for the other sounds.
Regarding subjective ratings of speech clarity Friedman’s test revealed an effect of processing in cafeteria noise (P<0.004; Figure 4B). Conover’s test showed that the median score of 69 with TNRmult was better than those with TNRoff and TNRsgl with 50 points each (P<0.05). No effect was found for speech clarity in office noise.
Regarding the overall preference ranking test, Friedman’s test found an effect when processing office noise (P<0.002; Figure 4C). Conover’s test subsequently revealed that TNRmult with a median rank of 1.5 was judged as better than TNRoff with a median rank of 2.5 (P<0.05). An effect was also found for aluminum crackling (P<0.002), where the median rank of 1.0 for TNRmult was better than those for TNRoff and TNRsgl each with a rank of 2.0 (P<0.05). No main effects were found for the remaining sounds.
Most participants rated the sound of the four warning signals processed with TNRmult as virtually identical to the unprocessed baseline TNRoff (Figure 5A). Bicycle bell and fire alarm were rated as virtually identical 13 times, ambulance siren 14 times and car honking 12 times. All remaining participants rated the processed signals as recognizable but distorted, no one perceived them as unrecognizably distorted. Regarding loudness quieter was rated for bicycle bell 2 times, fire alarm zero times, ambulance siren 6 times and car honking 8 times (Figure 5B). An equal loudness was rated for bicycle bell 12 times, fire alarm 14 times, ambulance siren 8 times and car honking 7 times. Louder was rated one time for bicycle bell, ambulance siren and fire alarm.
In this study we tested a novel transient noise reduction algorithm with CI users, which operate in multiple frequency bands. The performance of the algorithm (TNRmult) was compared to an existing single-band version (TNRsgl) and an unprocessed baseline condition (TNRoff). Speech test in noise, speech test in soft stationary noise as well as subjective ratings of speech clarity, comfort and overall preference ranking were conducted. Beside a significant improvement of speech intelligibility in cafeteria and office noise the speech test in soft stationary noise showed no difference in speech intelligibility. This outcome was expected since the algorithm is not designed to target this type of noise, and indicates that speech is not harmed in this situation. Nevertheless, subtle differences may have been obscured by ceiling effects considering the participants’ high performance level in this task (as in the same test without noise in previous study) despite adding a background noise of 50 dB. In order to not dispense with such a test in future trials, more difficult speech material could be used or subjects with lower speech performance can be invited.
The effect of TNRsgl found in the current study seems qualitatively different from our previous study.11 There, it achieved a small SRT benefit in dishes clattering of 0.4 dB compared to TNRoff. Here, we used the same dishes clattering noise but mixed with reverberant multi-talker babble, which was labeled as cafeteria noise. Furthermore, the presentation level was set 10 dB higher than in the previous study. Comparing SRT results between both studies reveals that the average speech level, relatively to the transient noise peaks, was 3 dB higher in the current study. As a consequence of the higher speech level and the added babble noise, TNRsgl detected fewer transients in cafeteria noise, which may be one factor explaining the apparent drop in efficacy. Secondly, the higher noise presentation level used here may by itself have affected the combined behavior of TNRsgl and AGC in an unfavorable manner. The designs of the two studies, however, do not allow for a definitive answer in this regard.
Finding the optimal mode of interaction between TNR and AGC is an important topic for further investigation. In both our current and our previous study,11 TNR and AGC were applied in series owing to our technical setup where a pre-processed audio signal was fed into a speech processor with the standard AGC acting on its input. Generally, the AGC lowers its broadband gain by an amount proportional to the input level whenever this level exceeds a certain threshold (around 63 dB SPL for a speech signal).16 Hence, when the TNR algorithm attenuates transients in the input signal, the AGC applies a higher gain than with TNRoff, thereby partially counteracting the level reduction in its input affected by the TNR algorithm. We therefore suggest that larger improvements in listening comfort might be achieved if the AGC mechanism was adjusted to take the effect of TNR into consideration in order to avoid such antagonistic behavior.
An improvement of comfort was found for two of the six types of noise used for testing (cafeteria and aluminum crackling). We compared the band-specific signal levels and amounts of attenuation for the different signals in an attempt to relate the pattern of subjective comfort ratings to fundamental signal properties (Figure 6). It may be noteworthy that the two signals where TNRmult improved comfort were also the two signals with the highest signal levels in the top frequency band (6.6-9.4 kHz). There is, however, no indication that the amount of attenuation applied by the algorithm (overall or in a specific frequency band) can qualitatively predict the outcome of the perceptual tests.
To obtain a possible explanation for these results we also calculated the mark-space ratio (MSR) of the test signals with TNRoff and TNRmult (Figure 6), which provides an objective measure of the degree of transientness of a signal.9 The MSR is calculated by first segmenting an audio signal into consecutive signal frames. Every frame is then squared and low-pass filtered to obtain the signal envelope. The fraction of time when the signal exceeds the mean of the envelope (marks) and when it stays below (spaces) is then calculated. The average ratio of marks and spaces (over all frames) is taken as the MSR of the signal.
A highly transient signal, like for example hammering nails, has an MSR close to 0 with few marks above the envelope as spaces below. A more stationary noise like party noise has more spaces than marks similar to a pure stationary noise, which has an MSR of 1. We observed the greatest increase in MSR (and therefore greatest reduction of transientness) with TNRmult for cafeteria noise and aluminum crackling, i.e., the two types of noises in which improvements in subjective comfort could be obtained. The predictive value of this measure regarding the perceptual effect of the algorithm, however, will need to be further assessed in future studies.
Further to our present investigation, it would be desirable to evaluate the algorithm in more natural listening conditions. This could entail a wide range of free-field tests in the lab, acute tests in relevant real-life scenarios and chronic assessment in a home trial (the latter requiring the implementation on a wearable speech processor). Another topic of relevance within this context is the simultaneous effect of TNR and stationary noise reduction algorithms (as they are being used for CIs to date) in complex noisy situations. Finally, it is worthwhile to investigate whether the benefit of the novel algorithm observed with CI users can be transferred to hearing aids, considering known impact of transient noise particularly on listening comfort of hearing aid users.3,20
Nevertheless, we think that the connection between subjective annoyance ratings and the spectra-temporal characteristics of transient noises needs to be investigated more closely, which may in turn provide the basis for individualized fitting guidelines as well as further technical improvements.
We demonstrated that a multi-band TNR algorithm provided benefits in different types of noise regarding speech intelligibility, listening comfort, speech clarity and overall preference, compared to a single-band algorithm and an unprocessed baseline condition. The multi-band approach results in a more selective treatment of noise and speech whereby only those frequency regions are being attenuated during ongoing speech that are affected by transient noise.