A team of researchers from New York University (NYU), co-led by Adeen Flinker, an Associate Professor of Biomedical Engineering at NYU Tandon and Neurology at NYU Grossman School of Medicine, and Yao Wang, a Professor of Biomedical Engineering and Electrical and Computer Engineering at NYU Tandon, is making remarkable strides in demystifying the neural complexities underlying speech. Their mission is to develop vocal reconstruction technology that offers newfound hope to those deprived of their speech.
The NYU research team has harnessed the potential of intricate neural networks to recreate speech from brain recordings, shedding light on the neural processes that underpin human speech. These groundbreaking discoveries have been detailed in the Proceedings of the National Academy of Sciences (PNAS).
Human speech production is a multifaceted process, involving feedforward control of motor commands and the concurrent feedback processing of self-produced speech. The simultaneous engagement of numerous brain networks makes it a formidable challenge to disentangle the exact timing and extent of cortical recruitment for motor control versus sensory processing during speech production.
In their recent research, the NYU team has achieved a remarkable breakthrough. They employed an innovative deep learning architecture on human neurosurgical recordings and utilized a rule-based differentiable speech synthesizer to decode speech parameters from cortical signals. The neural network architectures they implemented can differentiate between causal (utilizing current and past neural signals to decode current speech), anticausal (utilizing current and future neural signals), or a combination of both (noncausal) temporal convolutions, enabling a meticulous analysis of the contributions of feedforward and feedback processes in speech production.
Dr. Flinker elaborates, “This approach allowed us to disentangle the processing of simultaneous feedforward and feedback neural signals that occur while we produce speech and monitor feedback from our own voice.”
Remarkably, this innovative approach not only decoded interpretable speech parameters but also shed light on the temporal receptive fields of the cortical regions involved in speech production. Contrary to conventional thinking that separates feedback and feedforward cortical networks, the researchers’ analysis revealed a nuanced architecture of mixed feedback and feedforward processing spanning frontal and temporal cortices. This fresh perspective, combined with outstanding speech decoding capabilities, represents a substantial leap forward in understanding the intricate neural mechanisms behind speech production.
The NYU research team has leveraged this newfound understanding to inform the development of prosthetic devices capable of reading brain activity and directly translating it into speech. What sets their prototype apart is its capacity to recreate a patient’s voice, even with minimal datasets of recordings. This technology extends a lifeline to patients who have lost their voices, not only enabling them to regain their voice but also faithfully replicating it. This is made possible through a deep neural network that takes into account a latent auditory space and can be trained with just a few samples of an individual’s voice, such as a YouTube video or a Zoom recording.
To acquire the necessary data, the researchers collaborated with patients suffering from refractory epilepsy, a condition resistant to treatment with medication. These patients had a grid of subdural EEG electrodes implanted in their brains for a week to monitor their condition and consented to the addition of 64 smaller electrodes interspersed among the regular clinical electrodes. These invaluable insights into brain activity during speech production paved the way for this groundbreaking research.