Yuqi Sun (孙玉齐)

Hi! My name is Yuqi Sun. I got my PhD degree in 2025 from the School of Computer Science at Fudan University, under the supervision of Dr.Bo Yan. I also received my B.Sc. from Fudan University in 2020.

My research interest lies in leveraging artificial intelligence techniques for data governance (AI for data)—including, but not limited to, data management, data filtering, and data synthesis—to establish a data foundation for building low-cost, high-performing AI models. My previous research focused on multi-view imaging, face images, and rendered images, with a recent shift toward scientific fields such as medical imaging. I strongly believe that data governance is one of the most critical directions for AI innovation, essential for reducing model training costs, and I aim to extend its application to more scientific domains in the future.

After graduation, I co-founded FlyAiTech, a company dedicated to providing data governance and large-model product services to hospitals, public security agencies, manufacturing enterprises, and other organizations.

Email  /  CV  /  Google Scholar  /  Github

profile photo
Updates

2025-05: Two paper were accepted for ACM MM25
2025-03: Our new work has been published in Nature Biomedical Engineering (IF: 28.0)!
2024-07: Two paper were accepted for ACM MM24

Awards and Honors

  • Graduate Representative Speaker, Fudan University Graduate Commencement Ceremony, 2025
  • Fudan Academic Star Special Award, Fudan University, 2025 (Awarded to only 6 students across the entire university)
  • Outstanding Graduate, Fudan University, 2025
  • Fudan Top 10 Scientific Advances Nominee, Fudan University, 2024
  • Shanghai Yangpu 'Entrepreneurship Star' Emerging Talent Award, 2024
  • National Disruptive Technology Innovation Competition Winner, 2024
  • Featured in Media: Research covered by CCTV News and CCTV Defense and Military Channel, 2024

Research

I represent some of my publication here, and more works are ongoing. (*Equal contribution)

prl MM-Skin: Enhancing Dermatology Vision-Language Model with an Image-Text Dataset Derived from Textbooks
Wenqi Zeng, Yuqi Sun, Chenxi Ma, Weimin Tan, Bo Yan
ACM MM, 2025-05
Code / Paper

Medical vision-language models (VLMs) show potential as clinical assistants, but dermatology-specific VLMs lack detailed diagnostic capabilities due to limited dataset text descriptions. We introduce MM-Skin, a large-scale multimodal dermatology dataset with 3 imaging modalities (clinical, dermoscopic, pathological) and ~10,000 high-quality image-text pairs from textbooks, plus ~27,000 diverse VQA samples. Using MM-Skin and public datasets, we developed SkinVL, a specialized VLM for accurate skin disease interpretation, advancing clinical dermatology VLM development.

prl TabiMed: Tabularizing Medical Images for Few-Shot In-Context Diagnosis
Wanying Zhou, Yuqi Sun, Yu Ling, Zhen Xing, Chenxi Ma, Weimin Tan, Bo Yan
ACM MM, 2025-05
Paper

In biomedical image AI, TabiMed is a novel framework that solves the small-sample problem by transforming visual representations into structured tabular data. Unlike slow and overfitting-prone supervised fine-tuning (SFT) or inefficient zero-shot inference, TabiMed leverages in-context learning (ICL) and pre-trained tabular models to achieve superior accuracy and efficiency, with an average AUC that is 14.1% higher than zero-shot and a training speed 250 times faster than SFT. This approach offers a new and effective way to analyze biomedical images with limited data.

prl A data-efficient strategy for building high-performing medical foundation models
Yuqi Sun*, Weimin Tan*, Zhuoyao Gu, Ruian He, Siyuan Chen, Miao Pang, Bo Yan
Nature Biomedical Engineering, 2025-03-05
Code / Paper

Medical Foundation models typically require massive datasets, but medical data collection is costly, slow, and privacy-sensitive. We demonstrate that synthetic data, generated with disease labels, can effectively pretrain medical foundation models. Our retinal model, pretrained on one million synthetic retinal images and just 16.7% of the real-world data used by RETFound (904,170 images), matches or exceeds RETFound’s performance across nine public datasets and four diagnostic tasks. We also validate this data-efficient approach by building a tuberculosis classifier on chest X-rays. Text-conditioned synthetic data boosts medical model performance and generalizability with less real data.

prl Audio-Driven Identity Manipulation for Face Inpainting
Yuqi Sun*, Qing Lin*, Weimin Tan, Bo Yan
ACM MM, 2024
Code / Paper

Our main insight is that a person's voice carries distinct identity markers, such as age and gender, which provide an essential supplement for identity-aware face inpainting. By extracting identity information from audio as guidance, our method can naturally support tasks of identity preservation and identity swapping in face inpainting.

prl A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models
Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan
ACM MM, 2024
Paper

This paper introduces a comprehensive benchmark specifically for evaluating data-effective learning in the medical field. This benchmark includes a dataset with millions of data samples from 31 medical centers (DataDEL), a baseline method for comparison (MedDEL), and a new evaluation metric (NormDEL) to objectively measure data-effective learning performance.

prl Low-Latency Space-Time Supersampling for Real-Time Rendering
Ruian He*, Shili Zhou*, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan
AAAI, 2024
Code / Paper

We recognize the shared context and mechanisms between frame supersampling and extrapolation, and present a novel framework, Space-time Supersampling (STSS). By integrating them into a unified framework, STSS can improve the overall quality with lower latency. Notably, the performance is achieved within only 4ms, saving up to 75% of time against the conventional two-stage pipeline that necessitates 17ms.

prl Instruct-NeuralTalker: Editing Audio-Driven Talking Radiance Fields with Instructions
Yuqi Sun, Reian He, Weimin Tan, Bo Yan
Arxiv, 2023
Paper

We propose Instruct-NeuralTalker, the first interactive framework to semantically edit the audio-driven talking radiance fields with simple human instructions. It supports various taking face editing tasks, including instruction-based editing, novel view synthesis, and background replacement. In addition, Instruct-NeuralTalker enables real-time rendering on consumer hardware.

prl Geometry-Aware Reference Synthesis for Multi-View Image Super-Resolution
Ri Cheng, Yuqi Sun, Bo Yan, Weimin Tan, Chenxi Ma
ACM MultiMedia, 2022
Code / Paper

This paper proposes a Multi-View Image SuperResolution (MVISR) task. It aims to increase the resolution of multiview images captured from the same scene. One solution is to apply image or video super-resolution (SR) methods to reconstruct HR results from the low-resolution (LR) input view.

blind-date Learning Robust Image-Based Rendering on Sparse Scene Geometry via Depth Completion
Yuqi Sun, Shili Zhou, Ri Cheng, Weimin Tan, Bo Yan*, Lang Fu
CVPR, 2022
Code / Video / Paper

Recent image-based rendering (IBR) methods usually adopt plenty of views to reconstruct dense scene geometry. However, the number of available views is limited in practice. When only few views are provided, the performance of these methods drops off significantly, as the scene geometry becomes sparse as well. Therefore, in this paper, we propose Sparse-IBRNet (SIBRNet) to perform robust IBR on sparse scene geometry by depth completion.

clean-usnob Space-Angle Super-Resolution for Multi-View Images
Yuqi Sun*, Ri Cheng*, Bo Yan, Shili Zhou
ACM MultiMedia, 2021
Code / Paper

The limited spatial and angular resolutions in multi-view multimedia applications restrict their visual experience in practical use. In this paper, we first argue the space-angle super-resolution (SASR) problem for irregular arranged multi-view images. It aims to increase the spatial resolution of source views and synthesize arbitrary virtual high resolution (HR) views between them jointly.

Experience

Some internship experiences

cs188 Autonomous Driving Division, sensetime
May 2019 - 2020

  • Dynamic scene sensing
  • Simulator data processing and driving trajectory prediction
Shanghai Fuhuan Science and Technology
May 2019 - 2020

  • Research on face skin health detection algorithm

Social

I will upload some personal photography, videos and articles on my social media.


Website template from Yuqi Sun