|
Yuqi Sun (孙玉齐)
Hi! My name is Yuqi Sun. I got my PhD degree in 2025 from the School of Computer Science at Fudan University, under the supervision of Dr.Bo Yan. I also received my B.Sc. from Fudan University in 2020.
My research interest lies in leveraging artificial intelligence techniques for data governance (AI for data)—including,
but not limited to, data management, data filtering, and data synthesis—to establish a data foundation for
building low-cost, high-performing AI models. My previous research focused on multi-view imaging, face images,
and rendered images, with a recent shift toward scientific fields such as medical imaging. I strongly believe
that data governance is one of the most critical directions for AI innovation, essential for reducing model
training costs, and I aim to extend its application to more scientific domains in the future.
After graduation, I co-founded FlyAiTech, a company dedicated to providing data governance and large-model product services to
hospitals, public security agencies, manufacturing enterprises, and other organizations.
Email  / 
CV  / 
Google Scholar  / 
Github
|
|
|
Updates
2025-05: Two paper were accepted for ACM MM25
2025-03: Our new work has been published in Nature Biomedical Engineering (IF: 28.0)!
2024-07: Two paper were accepted for ACM MM24
|
|
Awards and Honors
- Graduate Representative Speaker, Fudan University Graduate Commencement Ceremony, 2025
- Fudan Academic Star Special Award, Fudan University, 2025 (Awarded to only 6 students across the entire university)
- Outstanding Graduate, Fudan University, 2025
- Fudan Top 10 Scientific Advances Nominee, Fudan University, 2024
- Shanghai Yangpu 'Entrepreneurship Star' Emerging Talent Award, 2024
- National Disruptive Technology Innovation Competition Winner, 2024
- Featured in Media: Research covered by CCTV News and CCTV Defense and Military Channel, 2024
|
|
Research
I represent some of my publication here, and more works are ongoing. (*Equal contribution)
|
|
MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks
Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan
ACM MM, 2025-05
Code
/
Paper
Medical vision-language models (VLMs) show potential as clinical assistants, but dermatology-specific VLMs
lack detailed diagnostic capabilities due to limited dataset text descriptions. We introduce MM-Skin, a large-scale
multimodal dermatology dataset with 3 imaging modalities (clinical, dermoscopic, pathological) and ~10,000 high-quality
image-text pairs from textbooks, plus ~27,000 diverse VQA samples. Using MM-Skin and public datasets, we developed SkinVL,
a specialized VLM for accurate skin disease interpretation, advancing clinical dermatology VLM development.
|
|
TabiMed: Tabularizing Medical Images for Few-Shot In-Context Diagnosis
Wanying Zhou, Yuqi Sun, Yu Ling, Zhen Xing, Chenxi Ma, Weimin Tan, Bo Yan
ACM MM, 2025-05
Paper
In biomedical image AI, TabiMed is a novel framework that solves the small-sample problem by transforming visual
representations into structured tabular data. Unlike slow and overfitting-prone supervised fine-tuning (SFT) or
inefficient zero-shot inference, TabiMed leverages in-context learning (ICL) and pre-trained tabular models to achieve
superior accuracy and efficiency, with an average AUC that is 14.1% higher than zero-shot and a training speed 250 times
faster than SFT. This approach offers a new and effective way to analyze biomedical images with limited data.
|
|
A data-efficient strategy for building high-performing medical foundation models
Yuqi Sun*, Weimin Tan*, Zhuoyao Gu, Ruian He, Siyuan Chen, Miao Pang, Bo Yan
Nature Biomedical Engineering, 2025-03-05
Code
/
Paper
Medical Foundation models typically require massive datasets, but medical data collection is costly, slow, and privacy-sensitive.
We demonstrate that synthetic data, generated with disease labels, can effectively pretrain medical foundation models.
Our retinal model, pretrained on one million synthetic retinal images and just 16.7% of the real-world data used by RETFound (904,170 images),
matches or exceeds RETFound’s performance across nine public datasets and four diagnostic tasks. We also validate this data-efficient approach by building
a tuberculosis classifier on chest X-rays. Text-conditioned synthetic data boosts medical model performance and generalizability with less real data.
|
|
Audio-Driven Identity Manipulation for Face Inpainting
Yuqi Sun*, Qing Lin*, Weimin Tan, Bo Yan
ACM MM, 2024
Code
/
Paper
Our main insight is that a person's voice carries distinct identity markers, such as age and gender,
which provide an essential supplement for identity-aware face inpainting. By extracting identity information from audio as guidance,
our method can naturally support tasks of identity preservation and identity swapping in face inpainting.
|
|
A Medical Data-Effective Learning Benchmark for Highly
Efficient Pre-training of Foundation Models
Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan
ACM MM, 2024
Paper
This paper introduces a comprehensive benchmark specifically for evaluating data-effective learning in the medical field. This benchmark includes
a dataset with millions of data samples from 31 medical centers
(DataDEL), a baseline method for comparison (MedDEL), and a new
evaluation metric (NormDEL) to objectively measure data-effective
learning performance.
|
|
Low-Latency Space-Time Supersampling for Real-Time Rendering
Ruian He*, Shili Zhou*, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan
AAAI, 2024
Code
/
Paper
We recognize the shared context and mechanisms between frame supersampling and extrapolation,
and present a novel framework, Space-time Supersampling (STSS). By integrating them into a unified framework,
STSS can improve the overall quality with lower latency. Notably, the performance is
achieved within only 4ms, saving up to 75% of time against the conventional two-stage pipeline that necessitates 17ms.
|
|
Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with
Instructions
Yuqi Sun, Reian He, Weimin Tan, Bo Yan
Arxiv, 2023
Paper
We propose Instruct-NeuralTalker, the first interactive framework
to semantically edit the audio-driven talking radiance fields
with simple human instructions. It supports various taking face editing
tasks, including instruction-based editing, novel view synthesis,
and background replacement. In addition, Instruct-NeuralTalker
enables real-time rendering on consumer hardware.
|
|
Geometry-Aware Reference Synthesis for Multi-View Image
Super-Resolution
Ri Cheng, Yuqi Sun, Bo Yan, Weimin Tan, Chenxi Ma
ACM MultiMedia, 2022
Code
/
Paper
This paper proposes a Multi-View Image SuperResolution (MVISR) task. It aims to increase the resolution of multiview images captured from the same scene. One solution is to apply
image or video super-resolution (SR) methods to reconstruct HR
results from the low-resolution (LR) input view.
|
|
Learning Robust Image-Based Rendering on Sparse Scene Geometry
via Depth Completion
Yuqi Sun, Shili Zhou, Ri Cheng, Weimin Tan, Bo Yan*, Lang Fu
CVPR, 2022
Code
/
Video
/
Paper
Recent image-based rendering (IBR) methods usually
adopt plenty of views to reconstruct dense scene geometry.
However, the number of available views is limited in practice.
When only few views are provided, the performance
of these methods drops off significantly, as the scene geometry
becomes sparse as well. Therefore, in this paper, we
propose Sparse-IBRNet (SIBRNet) to perform robust IBR
on sparse scene geometry by depth completion.
|
|
Space-Angle Super-Resolution for Multi-View Images
Yuqi Sun*, Ri Cheng*, Bo Yan, Shili Zhou
ACM MultiMedia, 2021
Code
/
Paper
The limited spatial and angular resolutions in multi-view multimedia
applications restrict their visual experience in practical use. In
this paper, we first argue the space-angle super-resolution (SASR)
problem for irregular arranged multi-view images. It aims to increase
the spatial resolution of source views and synthesize arbitrary
virtual high resolution (HR) views between them jointly.
|
|
Experience
Some internship experiences
|
|