A COMPARATIVE SPEECH ENHANCEMENT ANALYSIS ON TURKISH AND ENGLISH LANGUAGES USING ATTENTION-BASED WAVE-U-NET ARCHITECTURE


Creative Commons License

Gündüz A., Akgül İ.

24th INTERNATIONAL İSTANBUL SCIENTIFIC RESEARCH CONGRESS ON LIFE, ENGINEERING, ARCHITECTURE, AND MATHEMATICAL SCIENCES, İstanbul, Türkiye, 20 - 22 Şubat 2026, ss.879-890, (Tam Metin Bildiri)

Özet

Speech enhancement (SE) is a signal processing problem aimed at improving the intelligibility and quality of speech signals in noisy environments. While the vast majority of studies in the literature focus on English datasets, performance analyses on languages with unique phonetic characteristics and agglutinative structures, such as Turkish, remain limited. In this study, an Attention Wave-U-Net architecture is proposed, which provides end-to-end enhancement via raw waveforms and is reinforced with residual blocks, channel attention, and self-attention mechanisms. Within the scope of the study, a large-scale analysis was conducted using a total of 70,939 audio files, including English and Turkish languages. To evaluate the robustness of the model, real-world noises selected from the Microsoft DNS Challenge dataset and synthetic white noise were integrated into clean signals at various SNR levels (0-15 dB). Experimental results demonstrate that the proposed method provides significant improvement in both languages yet exhibits superior performance particularly on the Turkish dataset. In Turkish data, the average PESQ score was increased to 3.1875, while an average gain of 19.7768 dB was achieved in the time-domain SI-SNR metric. In the English set, the average PESQ score was raised to 2.5813, with a recorded average SI-SNR gain of 10.6260 dB. These results reveal the adaptation capability of attention-based models to language-specific acoustic characteristics and contribute to speech enhancement research conducted on the Turkish language.