Event
Title:
Recognition of Second-Language (L2) Speech by Human and Machine Listeners
Abstract:
All else being equal, both human and machine listeners typically exhibit higher error rates for second-language (L2) versus first-language (L1) speech particularly under realistic listening conditions that involve background noise. In this talk, I will examine this challenge of L2-speech-in-noise recognition through the lens of a comparison between human speech recognition (HSR) and automatic speech recognition (ASR). I will present recent data showing a narrowing of the HSR - ASR gap to the extent that a state-of-the-art ASR system can match and even exceed human listener speech recognition accuracy. However, close examination of HSR and ASR response patterns reveals critical qualitative divergences. Specifically, under low signal-to-noise ratio conditions, ASR is more likely than HSR to resort to signal-independent “hallucinations.” Moreover, HSR is strongly characterized by almost immediate adaptation to the talker and background noise while it remains unclear whether/how such “in-context” learning is available to current ASR systems. Overall, HSR versus ASR comparison opens a new window into critical features of HSR while at the same time establishing an empirical basis for determining when and how ASR may be appropriate for accelerating basic research in the speech, language, and hearing sciences.