Zhikang Niu (牛志康)


Ph.D. Student. Cross Media (X-)Language Intelligence Lab
Department of Computer Science and Engineering, Shanghai Jiao Tong University.
Work at Shanghai Innovation Institute.

Research Interests | Education and Intern | Publications | Projects | Honors and Awards | Activities |

if you have any questions, please feel free to contact me with zhikangniu@sjtu.edu.cn
[GitHub] [Google Scholar] [WeChat] [CV]

Research Interest

I work in the field of Audio Singal Processing, Audio Codec Model,Multimodal Large Language Model, Machine learning, and Deep learning supervised by Prof. Xie Chen, I will try my best in the next five exciting years! 💪. Currently, I focus on the following research topics:

Education and Intern

Publications

  • Zhikang Niu, Sanyuan Chen, Long Zhou, Ziyang Ma, Xie Chen, Shujie Liu.
    NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization.
    IEEE Spoken Language Technology Workshop (SLT 2024), [Link] [PDF] [BibTeX]

  • Yushen Chen, Zhikang Niu, Ziyang Ma, Keqi Deng, Chunhui Wang, Jian Zhao, Kai Yu, Xie Chen.
    F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching.
    F5-TTS has collected 10,000+ stars on GitHub.
    [Link] [PDF] [Code] [BibTeX] [Talk]

  • Ruiqi Yan, Xiquan Li, Wenxi Chen, Zhikang Niu, Chen Yang, Ziyang Ma, Kai Yu, Xie Chen.
    URO-Bench: A Comprehensive Benchmark for End-to-End Spoken Dialogue Models.
    [Link] [PDF] [BibTeX]

  • Guanrou Yang, Ziyang Ma, Zhisheng Zheng, Yakun Song, Zhikang Niu, Xie Chen*.
    Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning.
    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2023), [Link] [PDF] [BibTeX]

  • Wenxi Chen, Ziyang Ma, Ruiqi Yan, Yuzhe Liang, Xiquan Li, Ruiyang Xu, Zhikang Niu, et al.
    SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training.
    [Link] [PDF]

  • Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, Hui Zhang, Xie Chen.
    VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech.
    IEEE International Conference on Acoustics, Speech, and Signal Processing(ICASSP 2025), [Link] [PDF] [BibTeX]

Projects

Open-Source Projects:
  • thorough-pytorch: A Chinese PyTorch tutorial and it has already collected 2,300 more stars and 333 forks on GitHub.
  • CSBasicKnowledge: This repo will record some knowledge about computer science, artificial intelligence and EE. It has already collected 560 more stars on GitHub.
  • More open-source contents can be found on my GitHub.
Research Projects

Honors and Awards

  • 2024, Stars of Tomorrow, Microsoft Research Asia.
  • 2022, National Scholarship, Ministry of Education in China.
  • 2021, Meritorious Winner, Interdisciplinary Contest In Modeling.
  • 2021, 2023, The First Prize Scholarship, Xidian University.

Activities

  • 2023.09-2024.9, CS-BAOYAN maintainer (an open-source CS-BAOYAN organization).
  • 2021.11-Now, Datawhale member (an open-source AI organization), helped data science fans get involved in the AI community.




Updating time: 2025.3.3