Communication in demanding environments (e.g., noisy and/or reverberant environments) is a significant challenge faced by hearing impaired (HI) individuals.1 Hearing aids (HAs) equipped with directional microphone processing are the most common treatment solutions to address this problem.2 In general, directional microphones result in 2-5 dB improvement in speech reception thresholds (SRTs) in favourable environmental settings, but this benefit decreases when reverberation and distance from the source are factored in.2 In such challenging acoustic environments, a wireless remote microphone (RM) system can significantly enhance the speech perception abilities of HI listeners [A typical wireless RM system is comprised of a microphone placed close to the desired sound source, a transmitter connected to the microphone for radio frequency (RF) modulation and transmission, and a receiver for demodulation and signal delivery to the hearing aid.]. As an example, Lewis et al.3 reported a mean directional advantage of 2.3 dB for speech understanding in a diffuse noise environment with data collected from forty-six adult HI listeners. In contrast, approximately 20 dB speech perception benefit was observed in the same environmental condition and with the same participant cohort when RMs were utilized. Thus RMs are an attractive assistive listening device option for HI listeners and consequently require clinical attention for proper fitting and performance verification.
Technological advances continue to improve the signal processing, transmission and reception modules within RMs.4 Older generation RMs were based on the analog Frequency Modulation (FM) technology, while newer generation models employ digital wireless communication strategies and incorporate digital signal processing (DSP) algorithms at the transmitter and/or receiver.4 Given these technological differences across different models, a few studies have evaluated varied RM technologies with HI subjects and benchmarked their relative performance. For example, Schafer et al.5 compared the performance of RMs with conventional FM and with an adaptive FM (viz. VoicePriority i (VPi) from Oticon that automatically adjusted the receiver gain based on background noise levels) using objective speech recognition and subjective ratings data collected from twenty HI children. Results showed that the RMs incorporating VPi enabled significantly higher speech recognition scores and better subjective ratings than the traditional FM or HAs alone in high levels of background noise. In a similar vein, Thibodeau6 obtained speech recognition scores in noise from 11 HI adults with three RM systems (fixed FM, adaptive FM, and adaptive digital broadband from Phonak) in background noise levels ranging between 50-80 dBA. Results revealed no significant differences in speech recognition scores across RM technologies when background noise levels are ≤60 dBA, but significantly better scores for RMs utilizing adaptive digital technology than RMs incorporating adaptive FM technology at 70-, 75-, and 80-dBA background noise levels.
The previous two studies have compared different RM technologies from the same manufacturer (Oticon and Phonak, respectively). In contrast, Rodemerk and Galster7 compared the performance of four different RM systems that incorporated static analog FM, 900 MHz digital, 2.4 GHz digital, and Bluetooth wireless protocols respectively. SRTs were obtained from a group of sixteen adult HI listeners with each of these RMs, either alone (HA microphone muted) or in combination with the HA microphone. No statistically significant differences in SRT scores among the four different RMs were found in the RM only condition, but the 2.4 GHz RM performed poorly when used in conjunction with the HA microphone, indicating a potential negative interaction between the two signal processing paths. Similarly, Wolfe et al.8 measured sentence recognition in noise with seventeen HI adults in quiet and four background noise conditions (55, 65, 75, and 80 dBA) using two different RM systems: Phonak Roger and Resound Unite. While both RM systems offered substantial sentence recognition benefit over HAs alone, results showed that the adaptive digital wireless technology from Phonak was significantly better than the Resound system at 75 and 80 dBA background noise levels. Wolfe et al.8 surmised that the adaptive processing in the Phonak’s RM receiver may have contributed to its better performance.
The studies reviewed above highlight the performance differences among diverse RM technologies using data collected from HI participants. While subjective evaluation of RM performance is indispensable and can be considered as the gold standard, it is also time- and resource-intensive. As such, electroacoustic characterization, verification, and benchmarking of RM performance is attractive. Current clinical practice guidelines published by the American Academy of Audiology (AAA)9 recommend electroacoustic verification of RMs based on the transparency criterion, wherein acoustic inputs of 65 dB SPL to the RM and HA microphones are expected to generate equal outputs from the HA. AAA guidelines also recommend that the transparency test be carried out with a speech input signal. While the transparency criterion ensures that the RM under test does not change the frequency response of the HA, it neither assesses any potential distortion through the RM nor does it capture the impact of proprietary DSP algorithms, especially in noisy environments. The recently published ANSI/ASA S3.47 standard10 recommends a more comprehensive RM assessment including measurements of frequency response curves, frequency range, input-output characteristics, total harmonic distortion, and noise level [Note that the ANSI/ASA S3.47 standard addresses a broad class of Hearing Assistance Devices/Systems (HADS) which include hardwired RMs and wireless RMs that use Radio-frequency (RF), Audio-Frequency (AF), or infrared wireless transmission protocols. This paper focuses only on the RF-based wireless RMs]. However, the use of broadband noise or pure tones as test stimuli within the standard limits the generalization of RM response to speech inputs in realistic environments. Similar to the ANSI/ASA S3.22 standard for hearing aids,11 the tests specified in ANSI/ASA S3.47 are therefore more appropriate for quality control purposes as they do not provide meaningful information on assessing the impact of RM systems on perceived speech intelligibility and quality by HI listeners. This is important as there is a significant correlation between the user-perceived speech intelligibility/quality through a hearing device and his/her satisfaction with that device.1
More recently, researchers have endeavoured to extract electroacoustic measures from HA input and output speech stimuli that estimate the perceived speech intelligibility and quality. For example, Kates and Arehart12,13 reported the Hearing Aid Speech Perception Index (HASPI) and Hearing Aid Speech Quality Index (HASQI) metrics for respectively predicting the perceived intelligibility and quality of simulated HAs. In a survey paper, Falk et al.14 validated twelve existing instrumental quality and intelligibility measures with behavioural data collected from HI listeners, and concluded that a subset of these measures (which includes HASPI and HASQI) are appropriate for objective HA performance assessment in a variety of environmental conditions. However, these promising electroacoustic measures have yet to be applied for assessing the performance of RMs.
In summary, RMs offer a substantial benefit to HI listeners in challenging environments with greater levels of background noise and reverberation. Contemporary RMs differ in terms of their microphone configuration (omnidirectional vs directional), wireless communication protocols (FM vs adaptive FM vs, digital RF), and additional signal processing (adaptive gain, noise reduction, etc.). Furthermore, the coupling of RMs to personal HAs may result in unwanted changes to the gain/output in HAs and may lead to unwanted distortions. Electroacoustic measurement of RM performance is attractive for its clinical efficiency and cost-effectiveness, but the measures must have perceptual relevance. Therefore, the objective of this paper is to extend the instrumental speech quality and intelligibility metrics that have been previously validated for HA applications to the assessment of RMs. These instrumental metrics are then used to benchmark the performance of four RM systems from different manufacturers in different acoustic environments.
Materials and Methods
Remote microphones and hearing aid
Four RMs from three different manufacturers were assessed in this paper and their brief technical characteristics are given below: i) Comfort Audio15 digital microphone DM10 and the micro receiver DT10. This RM system utilizes a digital wireless communication protocol (Secure Stream Technology), supports a dynamic range of 60 dB, and an audio bandwidth of 100 Hz – 7000 Hz. The microphone is omnidirectional; ii) Oticon Amigo-T31 and Amigo R2 receiver.16 This RM system uses an adaptive FM protocol with automatic adjustment of FM emphasis based on background noise level (i.e., VoicePriority i [VPi]). It supports an audio bandwidth from 100 Hz to 8500 Hz and has a configurable omnidirectional or directional microphone; iii) Phonak EasyLink and MicroLink (MLxi) receiver.17 This RM system uses adaptive (dynamic) FM technology and supports audio bandwidth from 100 Hz up to 7000 Hz; iv) Phonak Roger-Inspiro and MicroLink (MLxi) receiver.18 This RM systems uses an adaptive digital wireless communication protocol in the 2.4 GHz band and a directional microphone. It supports an audio bandwidth of 100 Hz to 7300 Hz. This system also features adaptive gain control at the receiver depending on the background noise level, with the range of gain adaptation larger than the dynamic FM. A commercially available behind the ear (BTE) HA (Unitron Quantum Pro S) was used for interfacing to all four RMs under test. This HA was programmed to match the DSL 5.0 adult prescriptive targets for the N4 standard audiogram.19 All advanced signal processing features in the HA such as noise reduction, speech enhancement, and feedback cancellation were turned off. Fit to targets at soft (55 dB SPL), medium (65 dB SPL), and loud (75 dB SPL) input levels, and MPO were verified in Audioscan Verifit hearing aid test system. Once the HA fitting was verified, RM transparency was assessed using the Verifit system. The HA was connected to each RM receiver individually through the Direct Audio Input (DAI) and the HA was set to RM only (i.e., HA microphone off). Transparency criterion stipulates that equal inputs to the RMs under test must generate equal outputs from the HA, and that transparency is met if the average difference between HA only and RM only curves at 750, 1000 and 2000 Hz is < ±2 dB.9 However a difference of ±5 dB was considered acceptable in a recently published work comparing different RMs.6
Figure 1 displays the frequency response curves obtained in Verifit, where the panel shows the long-term averaged spectra for 65 dB SPL speech input. The four RM responses are indicated by different colors in this figure, with the crosses denoting the DSL 5.0 targets. It is evident from this Figure that RM frequency responses were similar to each other between 500 Hz - 4000 Hz at 65 dB SPL input. The average difference among the four RMs at 750, 1000, and 2000 Hz was < ±2 dB, indicating that transparency was achieved. Even at an input level of 75 dB SPL, the average difference across the same frequencies was 5 dB, with the highest difference observed at 750 Hz.
Experimental setup and data collection
For electroacoustic measurements, the BTE HA + RM receiver assembly was connected to an ear mold simulator and placed on a Head and Torso Simulator (HATS). This HATS served as the listener. The RM transmitter was placed on a different HATS with a built-in mouth simulator, which served as the talker (Figure 2). Speech stimuli (IEEE Harvard sentences) were presented through the mouth simulator and separate speaker(s) were used to present broadband background noise. The RM transmitters’ microphone was placed 20 cm away from the centre of mouth simulator where the measured speech presentation level was 80 dBA.
Recordings at the listener HATS in response to speech playback at talker HATS were collected for the following conditions: in quiet, and with uncorrelated noise at 0, and 10 dB SNR. The listener HATS was calibrated using the Bruel & Kjaer acoustic calibrator.
The tests were performed in two different environments: an acoustically benign sound booth (environment #1) with low reverberation (RT60 = 0.1 s) and with only one single noise source; and an acoustically harsher reverberation chamber (environment #2) with a higher degree of reverberation (RT60 = 0.76 s) and surround noise. The speech was played back through the built-in mouth simulator and noise was played back through one speaker placed half the way and perpendicular to the connecting line in environment #1, and through four speakers at 0°, 90°, 180°, and 270° azimuth in environment #2, as shown in Figure 2. In both environments, the noise level (and hence the SNR) was measured at the centre of listener’s head. All RM recordings were digitized at 16000 Hz sample rate and 16 bits/sample, and stored on a computer for offline analyses. As described in the previous section, some RMs have the option of selecting between omnidirectional and directional configuration for the transmitting microphone. In such cases, data was collected separately for the two different microphone configurations.
Speech intelligibility and quality metrics
Following the survey paper by Falk et al.14, three objective indices were chosen in this study for benchmarking the RM performance. These include the aforementioned HASPI and HASQI metrics, both of which are examples of intrusive measures in that they require access to the clean speech signal at the input of the transmitting microphone (explained in more detail below). However, practical applications involving RM use may not facilitate access to the clean speech input. Therefore a non-intrusive or referencefree objective metric that solely estimates the speech quality or intelligibility from HA output alone is desirable. As such, this study included a non-intrusive metric termed the Speech to Reverberation Modulation Energy Ratio- HA (SRMR-HA), which has been previously validated with HA data.20 A brief description of the computational details behind these three metrics is given below.
HASPI and HASQI
A computational model of the peripheral auditory system is at the heart of both HASPI and HASQI calculations. The indices are derived from the envelope and fine-structure features extracted from the outputs of a 32-channel gammatone filterbank mimicking the cochlear auditory processing. Effects of sensorineural hearing loss (SNHL) such as the broadening of auditory filters, elevated thresholds, and loudness recruitment are incorporated into the computational model based on the input audiogram.12,13 HASPI and HASQI differ on how the envelope and fine-structure features are weighted.
In HASPI, the clean reference speech signal is processed through a normal hearing auditory model, while the test signal (i.e., the corresponding HA output) is processed using an impaired auditory model. The envelope index was derived as the averaged cross-correlation between compact representations of the envelopes extracted from the clean and test stimuli, while the finestructure index was computed as the averaged cross-correlation between the higher intensity components of the gammatone filterbank outputs for clean and test stimuli. A linear weighting of these two features followed by a logistic function transformation is used to derive the final HASPI measure. HASPI was compared with Coherence-based Speech Intelligibility Index (CSII) and the Short- Time Envelope Correlation Index (STECI) for a dataset comprising speech with additive babble and processed through frequency compression, and was shown to be superior for these data.12
In HASQI, both the clean and test stimuli are processed using the impaired auditory model. The final HASQI value is the product of a nonlinear index and a linear index. The nonlinear index is derived from the previously described envelope and fine-structure correlations between clean and test signals. The linear index is derived by linearly combining the standard deviation of the differences in the values and slopes of the long term averaged spectra.13 HASQI was previously shown to correlate well with subjective ratings of HA speech quality by HI listeners for datasets incorporating simulated HA processing13 and real HAs in noisy and reverberant environments.20
As discussed above, both HASQI and HASPI need a reference signal to assess the quality and intelligibility of HA output. In the present study, for HASPI, this clean reference was the speech stimulus that was played back through the mouth simulator. For HASQI, the same speech stimulus was filtered based on the DSL 5.0 adult targets for the N4 audiogram to match the frequencyshaping that HA applied.
The SRMR-HA is a reference free speech quality estimator, which is a modified version of Speech to Reverberation Modulation Ratio (SRMR). SRMR-HA is calculated as the ratio of averaged modulation energies in the lower (4-18 Hz) and upper four (29-128 Hz) modulation channels.20 In contrast to the original SRMR, the SRMR-HA computation incorporates the SNHL effects by varying the Q factor of each gammatone channel and compressing the signal envelope in each channel based on the Audiogram, in a manner similar to the HASQI procedure.
The performance of SRMR-HA was validated and compared with HASQI with subjective speech quality ratings collected from HI listeners. Results showed that although the SRMR-HA was not as effective predictor of subjective speech quality as HASQI, it still was able to explain 70% of variance in the subjective data. As such, it was deemed as a promising non-intrusive speech quality estimator for assessing HA speech quality and applied to RM recordings in this study.
The four RMs investigated in this study were randomly labeled RMA, RMB, RMC, and RMD. Figure 3 displays the spectrograms computed from a sample set of RM recordings in environment #1, which allow for gauging the frequency range and noise level. Figure 3A and B depict the spectrograms of RMA recordings in speech in quiet and speech in noise (SNR = 0dB) conditions respectively, while Figure 3C and D show the corresponding spectrograms for RMB. It is evident from Figure 3C that the bandwidth of RMB is limited to less than 6 kHz, contrary to its specifications. Moreover, a higher internal noise level in higher frequencies can be observed with RMB recording in Figure 3C. Figure 3B shows that the RMA is more robust to the background noise than RMB, mainly due to the directional microphone configuration at its transmitter. Figure 4 displays the HASPI, HASQI, and SRMR-HA values obtained from the RM recordings in both environments, with no background noise and in the presence of background noise at 10 dB and 0 dB SNRs. It must be noted here that both HASPI and HASQI values are normalized to a range of 0-1, while there was no normalization of the SRMR-HA values. For all three metrics higher values indicate better intelligibility/quality.
As expected and evident in Figure 4, the objective metrics are lower with an increase in the background noise level and/or reverberation. Taking a closer look at the HASPI data, it can be seen that all RMs exhibit similar performance in quiet in environment #1. This condition is similar to the transparency verification condition in the test box, and implies that all RMs perform similarly when there is no background noise and low reverberation. However, differences do emerge in RM performance in quiet and in the presence of reverberation (Figure 4B) and in the presence of background noise in both environments. At SNR = 10 dB, RMA has the best performance followed by RMc utilizing a directional microphone. The performance gap between RMA and the rest of the RMs widened at 0 dB SNR (Figure 4A and B). Several factors may have contributed to the comparatively better performance of RMA, including the presence of directional microphone, which not only reduces background noises but also partially attenuates reverberation components. In addition, RMA incorporates an adaptive gain control strategy at the receiver.
In general, HASQI and SRMR-HA values show a similar trend as HASPI scores across noise and reverberation conditions. In fact, the correlation coefficient between HASPI and HASQI scores was 0.95, while the correlation coefficients of SRMR-HA with HASPI and HASQI were 0.75 and 0.71 respectively. The lower correlation coefficients exhibited by SRMR-HA are mainly due to the discrepancies in scores for RMD - SRMR-HA ranks RMD more favourably, especially in quiet conditions. In order to gain further insight into this, modulation spectrogram plots were obtained from RMA and RMD recordings and displayed in Figure 5. Recall that the SRMR-HA computes the ratio of the averaged modulation energies between 4-18 Hz and 29-128 Hz modulation channels. It can be noticed from Figure 5A and C that the RMA recording in quiet has a pocket of energy distribution in the upper modulation frequency region, while the RMD recording is devoid of it. As such, the SRMR-HA resulted in a higher score for RMD for the speech in quiet condition. At an SNR of 10 dB (Figure 5B and D), it can be seen that the modulation spectrogram of RMA recording was relatively unchanged (thus highlighting the robustness of RMA), while that of RMD recording was significantly affected.
Wireless RMs are an attractive assistive listening device option for HI listeners in challenging acoustical environments. Currently available wireless RMs differ in their wireless communication, microphone configuration, and internal signal processing. Behavioural studies have shown that these inter-device differences do lead to performance differences, with some RMs performing better than others. This paper undertook an alternative approach where the relative performance of four different RMs was benchmarked through objective, instrumental predictors of perceived speech quality and intelligibility by HI listeners.
The objective indices employed in this paper, viz. HASPI, HASQI, and SRMR-HA, have all previously been validated with data collected from HI listeners. The HASQI is an example of an intrusive speech quality estimator, where features extracted from the HA output are compared with the corresponding features from a properly selected reference input, and the differences quantified. HASQI was previously validated with subjective data collected from HI listeners using simulated hearing aid processing, and in non-reverberant environments.13,20 Similarly, SRMR-HA is an extension of the SRMR metric, with the incorporation of computational blocks simulating broadening of auditory filters due to SNHL and loudness recruitment. SRMR-HA was shown to correlate modestly with speech quality ratings obtained from HI listeners. 20 While these metrics have not been directly validated with behavioural data collected with RMs, it must be noted that all three metrics base their computation on the signal captured at the tympanic membrane (i.e., HA output). Thus any distortion or enhancement introduced by the RM will reflect in the HA output and therefore will have a corresponding influence on the calculated score. Nonetheless, in future it will be beneficial to collect behavioural speech quality and intelligibility data from HI listeners for different RMs across different environmental conditions and utilize this data to further validate the three objective metrics discussed in this paper.
The performance of the four RMs was benchmarked in two different environments - an acoustically benign environment with a single noise source and low reverberation and a harsh environment with diffuse noise and reverberation. The metrics showed the expected trend that the RM performance degrades with an increase in background noise level and reverberation. What is noteworthy is the differential degradation in RM performance across SNRs and reverberation conditions. This is despite the fact that all RMs were verified to be transparent, suggesting that transparency verification alone is inadequate for characterizing RM performance. It is clear from Figure 4 that RMA has a significantly higher HASPI, HASQI, and SRMR-HA scores in both environments at 0 dB and 10 dB SNRs. In addition, internal noise measurements have shown that RMA had the lowest noise floor among the four RM systems. RMA has the following salient features: i) proprietary digital wireless communication protocol in the 2.4 GHz band employing time and frequency diversity; ii) array microphone at the transmitter; and iii) additional signal processing at the receiver for automatic gain control in response to background noise level. Previous studies by Thibodeau6 and Wolfe et al.8 have shown this RM to provide significantly better speech recognition in noise by HI listeners, in comparison to other RM technologies. Thus, the electroacoustic data presented in this paper corroborate with the published behavioural data for this RM.
In comparing the performance of RMs other than RMA, two factors are of interest. One is the configuration of the transmission microphone. Figure 4 shows the data from RMC in omnidirectional and directional microphone configurations, with the directional microphone providing better objective scores. The second is the analog vs. digital wireless communication strategy. RMB and RMC employ analog FM, while RMD employs digital wireless protocol. Figure 4 shows equivalent performance among these three devices across different environments, indicating that digital wireless protocols per se are not superior to legacy analog FM technologies.
Although the present study utilized two mannequins - one representing the talker and the other representing the listener - for collecting the RM recordings, similar electroacoustic benchmarking can be accomplished in standard HA test boxes, in addition to the AAA transparency verification and the standardized ANSI/ASA S3.47 tests. In order to further investigate this, another set of recordings were made when the transmitter’s microphone was placed 4 cm from the centre of mouth simulator. This 4 cm distance also simulates the condition in which a boom microphone is used as the transmitting microphone. HASPI and HASQI results extracted from the 4 cm RM recordings correlated highly with the values extracted from the 20 cm recording set (0.97 and 0.93 respectively). Most HA test boxes (such as Audioscan Verifit) are equipped with two loudspeakers within the test box enclosure - one for speech presentation and the other for noise presentation, This enables simulation of conditions similar to the environment #1 conditions explored in this study. Thus, the electroacoustic metrics investigated in this study can potentially be included in standard HA test boxes for further characterizing the RM performance.
The performance of four different RMs was benchmarked in different acoustic environments using electroacoustic measures that predict perceived speech intelligibility and quality by HI listeners. Results showed disparity in RM performance, and the RM that utilized an array microphone at the transmitter, a proprietary digital wireless communication protocol, and additional processing at the receiver, performed the best and was relatively more robust to increases in background noise level and reverberation. The electroacoustic measures explored in this study can supplement the AAA’s device transparency guideline and the quality control tests outlined in the ANSI/ASA 3.47 standard.