Research
My research focuses on advancing artificial intelligence and machine learning, with particular emphasis on multimodal understanding, generation, and robustness. As a PhD student in ECE at Duke University, I work on the following key research directions:
Research Areas
Multimodal Understanding
Long-term Memory
Hippocampal-inspired Memory for Video Understanding [HippoMM]
Efficient Processing
Keyframe-oriented Token Pruning for Vision [KVTP]
Context-aware Pruning for Speech [SpeechPrune]
Signal Processing
Speech Envelope Decoding from EEG Signals [EEG-Decoding]
Multimodal Generation
Voice Synthesis
Bilingual Singing Voice Synthesis [BiSinger]
Singing Voice Data Scaling-up [ACE-Opencpop]
Cross-modal Generation
Stable Diffusion-Enhanced Voice Generation [Face2VSDEVGoice]
Speech Enhancement
Character-Based TV & Movie Speech Dataset [TMCSPEECH]
Zero-Shot Dysarthric Speech Reconstruction [TSVC]
Robustness in AI Systems
Adversarial Examples
Natural Adversarial Examples with Stable Diffusion [SD-NAE]
Out-of-Distribution Detection
Enhanced Benchmark for OOD Detection [OpenOOD]
Vision-Language Robustness
Generalized OOD Detection Survey [OOD-Survey]
Tools & Applications
Research Visualization
Research Trend Visualization Toolkit [RTVis]
AI Frameworks
Singing Voice Synthesis Toolkit [Muskits]
Material Science
Tracking Nanoparticle Diffusion in Polymers [NanoTrack]
Recognition & Achievements
- Oral presentation at Interspeech 2024
- Best Paper Award at AAAI Spring Symposium 2025
- Honorable Mention Demo Award at ACM Multimedia 2024
Industry Experience
Adobe Research
Research Intern · San Jose, CA · Summer 2025
Use this template
Last edited on May 30, 2025