Research Interest
I work in the field of Audio Singal Processing, Audio Codec Model,Multimodal Large Language Model, Machine learning, and Deep learning supervised by
Prof. Xie Chen, I will try my best in the next five exciting years! 💪. Currently, I focus on the following research topics:
- Audio Tokenizer (Discrete and Continuous)
- Speech Synthesis (Text to Speech)
- Multimodal Large Language Model
Education and Intern
Publications
Speech Synthesis/Omni System
- Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu.
NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization.
SLT 2024,
[Link]
[PDF]
[Code]
[BibTeX]
- Jeongsoo Choi*, Zhikang Niu*, Ji-Hoon Kim, Chunhui Wang, Joon Son Chung, Xie Chen.
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment.
InterSpeech 2025,
[Link]
[PDF]
[Code]
- Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen.
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.
ACL 2025 Main,
[Link]
[PDF]
[Code]
[BibTeX]
[Talk]
F5-TTS has collected 12,000+ stars on GitHub.
- Qixi zheng, Yushen Chen, Zhikang Niu, Ziyang Ma, Xiaofei Wang, Kai Yu, Xie Chen.
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling.
InterSpeech 2025,
[Link]
[PDF]
- Yuzhe Liang, Wenzhe Liu, Chunyu Qiang, Zhikang Niu, Yushen Chen, Ziyang Ma, Wenxi Chen, Nan Li, Chen Zhang, Xie Chen.
Towards Flow-Matching-based TTS without Classifier-Free Guidance.
[Link]
[PDF]
- Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, et al.
SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training.
ACL 2025 Findings,
[Link]
[PDF]
[Code]
[BibTeX]
- Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen.
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
ICASSP 2025,
[Link]
[PDF]
[BibTeX]
- Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen.
Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning.
ASRU 2023,
[Link]
[PDF]
[Code]
[BibTeX]
Benchmark
- MMAR Team.
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix.
[Link]
[PDF]
[Code]
- Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen.
URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models.
[Link]
[PDF]
[Code]
[BibTeX]
Projects
Open-Source Projects:
- thorough-pytorch: A Chinese PyTorch tutorial and it has already collected 2,300 more stars and 333 forks on GitHub.
- CSBasicKnowledge: This repo will record some knowledge about computer science, artificial intelligence and EE. It has already collected 560 more stars on GitHub.
- More open-source contents can be found on my GitHub.
Research Projects
Honors and Awards
- 2024, Third Prize, 21/1600, Wenxin Cup Entrepreneurship Competition, Baidu.
- 2024, Stars of Tomorrow, Microsoft Research Asia.
- 2022, National Scholarship, Ministry of Education in China.
- 2021, Meritorious Winner, Interdisciplinary Contest In Modeling.
- 2021, 2023, The First Prize Scholarship, Xidian University.
Activities
- 2023.09-2024.9, CS-BAOYAN owner (an open-source CS-BAOYAN organization).
- 2021.11-Now, Datawhale member (an open-source AI organization), helped data science fans get involved in the AI community.
Updating time: 2025.5.16