经过程序委员会和评审专家们的严格审稿,本届会议论文的第二轮审稿结果已公布,录用通知书已以邮件方式发送完毕,请作者们注意查收。按照修改要求,望在规定时间内提交定稿(Camera Ready Paper)为盼。
谢谢大家的支持!
已录用论文信息表(包括第一轮)
| 论文编号 | 已录用论文题目 |
| 2 | the analysis for vowels in neutral tone syllable |
| 3 | 面向心音识别的自监督联邦学习 |
| 4 | The attention-based fusion of master-auxiliary network for speech enhancement |
| 5 | “去1+VP+去2”中“去2”的轻化程度及语义虚实分析 |
| 7 | D-AGNet: A Dual-branch Network with Attention Guidance for Speech Emotion Recognition |
| 8 | 基于跨语言数据迁移的端到端伪造语音检测方法 |
| 9 | Zero-Shot Personalized Voice Synthesis with Cross-Attention Speaker Embeddings |
| 10 | ASD-Diff: Unsupervised Anomalous Sound Detection With Masked Diffusion Model |
| 11 | A longitudinal study on the acquisition of standard Chinese monophthongs by intermediate- and advanced- level Korean Learners |
| 12 | PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding |
| 14 | A Comprehensive Data Processing Pipeline for LLM based Text-to-Speech Models |
| 15 | 《广州话韵律边界时长模式研究》 |
| 16 | MIXDIFF: Mixture Diffusion Model for Efficient Text-to-Speech |
| 17 | Analysis and Construction of Corpus Based on Kazakh Text Character Encoding |
| 19 | 声调音高曲线的调头处理办法初探 |
| 21 | 普通话与重庆话长时基频特征的时长阈限研究 |
| 22 | StyleSVC: Singing Voice Conversion with Multi-Scale Style Transfer |
| 25 | 基于互信息特征解耦的情感一致性语音转换 |
| 29 | 基于音高线索的普通话对比焦点感知研究 ——以阳平、去声为例 |
| 30 | 基于去噪扩散概率模型的对抗攻击方法 |
| 31 | Contrastive focus perception in Mandarin from pitch -- Take Tone 1 and Tone 3 as an example(Agree to publish/participate in the Best Paper evaluation in the journal) |
| 33 | 中级俄语母语者汉语阳平双字调声学分析 |
| 36 | 维吾尔语清塞音k习得的声学特征研究 |
| 37 | Speech emotion recognition based on multi acoustic feature fusion |
| 38 | 跨语言音系对比及错误分析研究 |
| 40 | Data Augmentation and Progressive Learning in Acoustic Echo Cancellation for Duplex conversations on Mobile Devices |
| 41 | Emergence of Hemispheric Asymmetries and Predictive Coding in the Neural Mechanism of Speech Perception |
| 42 | A pilot study on the perception of "dearing" emotional speech |
| 43 | Phoneme Semantic Backdoor Attacks with Multiple Task Learning for Speech Classification Task |
| 45 | Burmese Speech Synthesis Based on Diffusion Model |
| 46 | IUMSS-CETL:Low-Resource Iu Mien Speech Synthesis based on Transfer Learning |
| 50 | AESR: Speech Recognition With Speech Emotion Recogniting Learning |
| 51 | 基于脑控嵌入向量的语音分离网络 |
| 55 | 基于声学参数的蒙古呼麦情感表现研究 |
| 56 | 鄂温克语阴阳元音舌根位置的声学表现 |
| 57 | A Comparative Analysis of Diphthong Acquisition in Standard Chinese by Learners from ‘the Belt and Road’ |
| 58 | 复杂动态系统背景下邹平方言入声调变异速率实证研究 |
| 60 | Domain Adaptation for Front-End Module in TTS with LoRAs |
| 63 | 东部裕固语长短元音空间分布的声学统计分析 |
| 64 | ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram |
| 65 | Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech |
| 66 | M-CMGAN: Attempting to Use Mamba on Speech Enhancement |
| 68 | DA-KWFormer: A Domain Adaptation Network with K-Weight Transformer for Speech Emotion Recognition |
| 69 | Anomalous Sound Detection Using Time-Frequency Feature and Mixbatch |
| 70 | Improving Speaker Verification Back-end with Graph Neural Networks |
| 71 | Integrating Time-Frequency Domain Shallow and Deep Features for Speech-EEG Match-Mismatch of Auditory Attention Decoding |
| 72 | 基于注意力机制和数据过采样的酒瓶裂纹敲击异常声音检测系统 |
| 74 | Dual-Path Spectrogram Refinement Network for Robust Speaker Verification |
| 75 | Transformer-based Model for Auditory EEG Decoding |
| 76 | 刻意伪装场景下的说话人确认 |
| 77 | 基于语音谐波结构的多通道语音增强网络 |
| 78 | 基于文化分析的跨语言语音情感识别 |
| 79 | 布朗语学龄儿童国家通用语声调习得研究 |
| 80 | 融合多源知识的文物描述自动生成方法研究 |
| 81 | Enhancing Transducers for Robust Keyword Spotting by Duration Modeling |
| 82 | A Backend-friendly On-device Multi-channel Speech Enhancement System with IPD and PHM |
| 83 | SESNet: A Speech Enhancement and Separation Network in Noisy Reverberant Environments |
| 85 | 普通话双音节词连上变调的信息机制研究 |
| 86 | 基于解耦学习的鲁棒说话人验证研究 |
| 87 | A Neural Denoising Vocoder for Clean Waveform Generation from Noisy Mel-Spectrogram based on Amplitude and Phase Predictions |
| 89 | BER: Balanced Error Rate For Speaker Diarization |
| 90 | Adaptive Context Biasing in Transformer-based ASR Systems |
| 91 | Baseline Systems for Chinese Continuous Visual Speech Recognition Challenge 2024 |
| 92 | Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations |
| 93 | 潮州话入声调的韵律边界效应 |
| 96 | Enhancing Mispronunciation Detection with Multi-Speaker Text-to-Speech and Mixture-of-Experts Network |
| 97 | A study on the allocations of prosody boundaries in L2 Mandarin speech by Vietnamese learners based on dependency syntax |
| 99 | MHAN: Bottleneck Fusion Model Based on Hybrid Attention Network for Multimodal Emotion Recognition |
| 100 | Paraformer-v2: An improved single step non-autoregressive transformer for noise-robust speech recognition |
| 101 | Sound Zone Control Based On A Kronecker Second-order Tensor Decomposition |
| 102 | 多民族语言语音到语音翻译研究进展综述 |
| 103 | 面向心理治疗的共情对话系统 |
| 104 | A Brief Survey on the Explainability for Deep Speech Processing Models |
| 105 | Are Transformers in pretrained LM A Good ASR Encoder? An Empirical Study |
| 107 | 基于统计分类的一级甲等普通话五度调值划分研究 |
| 108 | 俄罗斯学生普通话辅音/t/、/tɕ/的感知同化与区分 |
| 111 | MCDubber: Multimodal Context-Aware Expressive Video Dubbing |
| 114 | 面向说话人识别的最近邻惩罚圆损失函数 |
| 115 | TeleSpeechPT: Large-Scale Chinese Multi-Dialect And Multi-Accent Speech Pre-Training |
| 116 | Investigation into the Impact of Speaker Adversarial Perturbation on Speech Recognition |
| 117 | Speaker extraction with verification of present and absent target speakers |
| 118 | Exploring Discrete Tokens Suitable for Speech Synthesis |
| 119 | Stage-Wise and Piror-Aware Neural Speech Phase Prediction |
| 120 | Pruning and Quantization Enhanced Densely Connected Neural Network for Efficient Acoustic Echo Cancellation |
| 122 | MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios |
| 123 | LightAC: Lightweight Accent Conversion via Module-Wise Distillation |
| 125 | 基于双对齐和对比学习的多模态情感识别 |
| 126 | 越南学生汉语-越南语声调的感知同化模式 |
| 127 | 德昂语的焦点韵律实现 |
| 128 | 基于神经网络的重音提取及重音描述提示生成 |
| 129 | Improved DOA Estimation of Sound Source of Small Amplitudes Using a Single Acoustic Vector Sensor |
| 130 | 基于通道注意力特征融合的异常音检测方法 |
| 133 | 融合语种信息的端到端多语种语音识别方法 |
| 135 | Investigation on Training Strategy for Cross-Modal Large Language Models with Speech and Text |
| 136 | XCB: an effective contextual biasing approach to bias cross-lingual phrases in speech recognition |
| 137 | An local-aggregation based method for robust speaker verification |
| 138 | An acoustic study of Mandarin Chinese vowels produced by Uyghur-speaking learners |
| 139 | ExARN: Target Speaker Extraction with Attentive Recurrent Networks |
| 141 | Emotional Speech Synthesize via Visual Context Perception |
| 143 | Tone Perception by Putonghua-Learning Preschool Children in South Xinjiang Uyghur Autonomous Region |
| 145 | 基于多流序列和对比学习的音乐生成研究 |
| 147 | 宜人性与开放性人格特质对情绪语音特征的影响 |
| 148 | 外倾性和神经质人格特质对情绪语音特征的影响 |
| 150 | 单音节阳平声调拐点位置与音节结构关系考察 |
| 151 | 基于注意力机制软切分的发音偏误检测探究 |
| 152 | Study on Prosodic Disambiguation of VP/NP syntactic structure by Chinese EFL Learners |
| 153 | An electroencephalogram-based study of neural responses to imagined speech in Mandarin |
| 154 | 基于语音病理特征的不流畅语音片段标注系统 |
| 155 | 基于脑电信号的汉语普通话语音分类研究 |
| 156 | A Speech Corpus of Putonghua-Learning Preschoolers From the Uygur Ethnic Group in South Xinjiang Uygur Autonomous Region |
| 157 | Evaluation of Data Inconsistency for Multi-modal Sentiment Analysis |
| 159 | 基于尖峰特征的口音识别和语音识别多任务学习方法 |
| 160 | A Study of Phonetic Differences in Hankou Dialect in the late Qing Dynasty -- Based on the records of A Chinese-English Dictionary and The Hankow Syllabary |
| 163 | Efficient Singular Spectrum Mode Ensembler Capable of Extracting Wide-band Components in Spectrum Overlapping Scenarios |
| 167 | 俄语母语者普通话舌尖元音[ɿ]、[ʅ]产出训练研究 |
| 168 | 普通话擦音的空气动力学特性 |
| 171 | LDMME: Latent Diffusion Model for Music Editing |
| 172 | 基于前音节锚点音高规整的重音和间断特征考察 |
| 174 | Analyzing the Improvement of Supplementary Features on Voice Conversion Using Disentangled Representation Learning |
| 178 | 基于计划梯度反转的说话人无关韵律表征研究 |
| 180 | 基于Transformer编码改进GCRN网络的单通道语音增强方法 |
| 181 | 面向CPEP3评估的孤独症谱系障碍儿童语言表达能力自动化预测方法 |
| 182 | Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech |
| 183 | Multilingual Speech Recognition Using Discrete Tokens with a Two-step Training Strategy |
| 184 | Mongolian Speech Recognition Based on Semi-Supervised Learning and Syllable Subword Modeling Units |
| 185 | 视频信息辅助的歌声旋律提取 |
| 186 | Dynamic properties of diphthongs in Hefei Mandarin |
| 187 | 削波语音声学特征研究 |
| 188 | 中级水平俄语母语者汉语停延习得研究 |
| 189 | 哈萨克斯坦汉语学习者辅音声母习得分析 |
| 190 | On the effectiveness of enrollment speech augmentation for Target Speaker Extraction |
| 191 | 俄语母语者汉语同声调句中的声调产出研究 |
| 193 | Demystifying the Robustness of Deep Speaker Recognition Against Non-Speech Segments |
| 196 | An Unsupervised Domain Adaptation Method based on Distribution Alignment for Speaker Verification |
| 199 | Improving Emotion Recognition with Pre-trained Models, Multimodality, and Contextual Information |
| 200 | 普通话学习者与母语者的嗓音质量对比分析 |
| 202 | Cross-Model Knowledge Distillation and Metadata Fusion for Respiratory Sound Classification |
| 203 | Tibetan-Chinese Speech-to-Speech Translation Based on Large Speech Models |
| 204 | A Study on the Effect of Focus on Vowel Duration and Formant in Cantonese |
| 205 | Tibetan Speech Synthesis Based on Pre-traind Mixture Alignment FastSpeech2 |
| 206 | 儿化感知影响因素研究 |
| 207 | 俄语母语者汉语朗读流利度评分模型探究 |
| 210 | A New Parameter to Indicate the Syllable Stress Levels in Mandarin |
| 212 | 藏语安多方言塞音发声的声学和电声门图研究 |
| 213 | 基于超声成像的藏语安多方言塞音研究 |
| 214 | Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats |
| 215 | Task-Specific Dementia Risk Detection Based on Psycholinguistic Knowledge and Linguistic Features |
| 216 | Dementia Risk Detection via Text Augmentation-based Multi-Task Learning |
| 218 | Language-independent Prosody-enhanced Speech Representations for Multilingual Speech Synthesis |
| 219 | Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation |
| 220 | Amphion: An Open-Source Audio, Music and Speech Generation Toolkit |
| 226 | 样本量大小对双元音/ei/话者区分能力评判的影响 |
| 227 | 16年跨度对普通话青年女声声学特征的影响 |
| 230 | NDVQ: Robust Neural Audio Codec with Distribution-Based Vector Quantization |
| 231 | CTC-Assisted LLM-Based Contextual ASR |
| 233 | 抑郁症情感表达的多模态生理数据库构建及分析 |