Yuqi Sun

Yuqi Sun (孙玉齐)

Hi! My name is Yuqi Sun. I got my PhD degree in 2025 from the School of Computer Science at Fudan University, under the supervision of Dr.Bo Yan. I also received my B.Sc. from Fudan University in 2020.

My research interest lies in leveraging artificial intelligence techniques for data governance (AI for data)—including, but not limited to, data management, data filtering, and data synthesis—to establish a data foundation for building low-cost, high-performing AI models. My previous research focused on multi-view imaging, face images, and rendered images, with a recent shift toward scientific fields such as medical imaging. I strongly believe that data governance is one of the most critical directions for AI innovation, essential for reducing model training costs, and I aim to extend its application to more scientific domains in the future.

After graduation, I co-founded FlyAiTech, a company dedicated to providing data governance and large-model product services to hospitals, public security agencies, manufacturing enterprises, and other organizations.

Email / CV / Google Scholar / Github

Updates

2025-05: Two paper were accepted for ACM MM25
2025-03: Our new work has been published in Nature Biomedical Engineering (IF: 28.0)!
2024-07: Two paper were accepted for ACM MM24

Awards and Honors

Graduate Representative Speaker, Fudan University Graduate Commencement Ceremony, 2025
Fudan Academic Star Special Award, Fudan University, 2025 (Awarded to only 6 students across the entire university)
Outstanding Graduate, Fudan University, 2025
Fudan Top 10 Scientific Advances Nominee, Fudan University, 2024
Shanghai Yangpu 'Entrepreneurship Star' Emerging Talent Award, 2024
National Disruptive Technology Innovation Competition Winner, 2024
Featured in Media: Research covered by CCTV News and CCTV Defense and Military Channel, 2024

Research

I represent some of my publication here, and more works are ongoing. (*Equal contribution)

	MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan ACM MM, 2025-05 Code / Paper Medical vision-language models (VLMs) show potential as clinical assistants, but dermatology-specific VLMs lack detailed diagnostic capabilities due to limited dataset text descriptions. We introduce MM-Skin, a large-scale multimodal dermatology dataset with 3 imaging modalities (clinical, dermoscopic, pathological) and ~10,000 high-quality image-text pairs from textbooks, plus ~27,000 diverse VQA samples. Using MM-Skin and public datasets, we developed SkinVL, a specialized VLM for accurate skin disease interpretation, advancing clinical dermatology VLM development.
	TabiMed: Tabularizing Medical Images for Few-Shot In-Context Diagnosis Wanying Zhou, Yuqi Sun, Yu Ling, Zhen Xing, Chenxi Ma, Weimin Tan, Bo Yan ACM MM, 2025-05 Paper In biomedical image AI, TabiMed is a novel framework that solves the small-sample problem by transforming visual representations into structured tabular data. Unlike slow and overfitting-prone supervised fine-tuning (SFT) or inefficient zero-shot inference, TabiMed leverages in-context learning (ICL) and pre-trained tabular models to achieve superior accuracy and efficiency, with an average AUC that is 14.1% higher than zero-shot and a training speed 250 times faster than SFT. This approach offers a new and effective way to analyze biomedical images with limited data.
	A data-efficient strategy for building high-performing medical foundation models Yuqi Sun, Weimin Tan, Zhuoyao Gu, Ruian He, Siyuan Chen, Miao Pang, Bo Yan Nature Biomedical Engineering, 2025-03-05 Code / Paper Medical Foundation models typically require massive datasets, but medical data collection is costly, slow, and privacy-sensitive. We demonstrate that synthetic data, generated with disease labels, can effectively pretrain medical foundation models. Our retinal model, pretrained on one million synthetic retinal images and just 16.7% of the real-world data used by RETFound (904,170 images), matches or exceeds RETFound’s performance across nine public datasets and four diagnostic tasks. We also validate this data-efficient approach by building a tuberculosis classifier on chest X-rays. Text-conditioned synthetic data boosts medical model performance and generalizability with less real data.
	Audio-Driven Identity Manipulation for Face Inpainting Yuqi Sun, Qing Lin, Weimin Tan, Bo Yan ACM MM, 2024 Code / Paper Our main insight is that a person's voice carries distinct identity markers, such as age and gender, which provide an essential supplement for identity-aware face inpainting. By extracting identity information from audio as guidance, our method can naturally support tasks of identity preservation and identity swapping in face inpainting.
	A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan ACM MM, 2024 Paper This paper introduces a comprehensive benchmark specifically for evaluating data-effective learning in the medical field. This benchmark includes a dataset with millions of data samples from 31 medical centers (DataDEL), a baseline method for comparison (MedDEL), and a new evaluation metric (NormDEL) to objectively measure data-effective learning performance.
	Low-Latency Space-Time Supersampling for Real-Time Rendering Ruian He, Shili Zhou, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan AAAI, 2024 Code / Paper We recognize the shared context and mechanisms between frame supersampling and extrapolation, and present a novel framework, Space-time Supersampling (STSS). By integrating them into a unified framework, STSS can improve the overall quality with lower latency. Notably, the performance is achieved within only 4ms, saving up to 75% of time against the conventional two-stage pipeline that necessitates 17ms.
	Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions Yuqi Sun, Reian He, Weimin Tan, Bo Yan Arxiv, 2023 Paper We propose Instruct-NeuralTalker, the first interactive framework to semantically edit the audio-driven talking radiance fields with simple human instructions. It supports various taking face editing tasks, including instruction-based editing, novel view synthesis, and background replacement. In addition, Instruct-NeuralTalker enables real-time rendering on consumer hardware.
	Geometry-Aware Reference Synthesis for Multi-View Image Super-Resolution Ri Cheng, Yuqi Sun, Bo Yan, Weimin Tan, Chenxi Ma ACM MultiMedia, 2022 Code / Paper This paper proposes a Multi-View Image SuperResolution (MVISR) task. It aims to increase the resolution of multiview images captured from the same scene. One solution is to apply image or video super-resolution (SR) methods to reconstruct HR results from the low-resolution (LR) input view.
	Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion Yuqi Sun, Shili Zhou, Ri Cheng, Weimin Tan, Bo Yan, Lang Fu CVPR*, 2022 Code / Video / Paper Recent image-based rendering (IBR) methods usually adopt plenty of views to reconstruct dense scene geometry. However, the number of available views is limited in practice. When only few views are provided, the performance of these methods drops off significantly, as the scene geometry becomes sparse as well. Therefore, in this paper, we propose Sparse-IBRNet (SIBRNet) to perform robust IBR on sparse scene geometry by depth completion.
	Space-Angle Super-Resolution for Multi-View Images Yuqi Sun, Ri Cheng, Bo Yan, Shili Zhou ACM MultiMedia, 2021 Code / Paper The limited spatial and angular resolutions in multi-view multimedia applications restrict their visual experience in practical use. In this paper, we first argue the space-angle super-resolution (SASR) problem for irregular arranged multi-view images. It aims to increase the spatial resolution of source views and synthesize arbitrary virtual high resolution (HR) views between them jointly.