Join

VLM3D Challenge – Task 4: Text‑Conditional CT Generation¶

Welcome to Task 4 of the Vision‑Language Modeling in 3D Medical Imaging (VLM3D) Challenge. In this task, teams must synthesize realistic 3D chest CT volumes from free‑form radiology text prompts.

Contents¶

Overview
Dataset
Task Objective
Participation Rules
Evaluation & Ranking
Prizes & Publication
Citation
Contact

Overview¶

Synthetic 3D imaging unlocks:

Data augmentation for scarce pathologies
Privacy‑preserving sharing of realistic cases
Pre‑training of downstream models

Task 4 challenges participants to convert clinical text (radiology reports) into high‑fidelity chest CT scans that faithfully reflect the described anatomy and pathology.

Dataset¶

Split	Patients	CT Volumes	Reports	Source
Train	20 000	≈ 47 k	20 000	Istanbul Medipol University
Validation	1 304	≈ 3 k	1564	Istanbul Medipol University
Internal Test	2 000	2 000	hidden	Istanbul Medipol University
External Test	1 024	1 024	hidden	Boston University Hospital

Each report is the conditioning prompt; each nifti volume is the target output.

Task Objective¶

Given a radiology report, generate a 3D nifti chest CT that:

Matches anatomical context (lungs, mediastinum, pleura)
Reflects all described pathologies (e.g., “right lower‑lobe nodule 5 mm”)
Exhibits realistic Hounsfield distributions, spacing & slice thickness

Output volume must keep the same voxel spacing specified in the submission template.

Participation Rules¶

Inference: Fully automatic – no manual editing.
Training data: CT‑RATE + public data/models permitted.
External masks: Allowed, but output must be a full CT volume.
Submissions: One compressed archive per scan, max 1 run/day; last run counts.
Organizers: Visible on leaderboard, not prize‑eligible.

Evaluation & Ranking¶

Generation Metrics¶

Metric	Role
FVD (CT‑Net)	CT‑specific Fréchet distance (anatomical realism)
CT‑CLIP Score	Text‑image and image-image alignment
FID	Global visual realism

Metrics are averaged over the test set.

Final Ranking¶

Compute all metrics above.
For each metric, run a two‑sided permutation test (10 000 samples) between team pairs.
Award 1 point per significant win; sum points across metrics.
Order teams by total points (higher = better). Ties share the same rank.

Missing volumes receive the worst score for that scan.

Prizes & Publication¶

Awards – details TBA.
Every team with a valid submission will be invited to co‑author the joint challenge paper.
An overview manuscript describing baseline results will appear on arXiv before the test phase closes.

Citation¶

@misc{hamamci2024foundation,
      title={Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography}, 
      author={Ibrahim Ethem Hamamci and Sezgin Er and Furkan Almas and Ayse Gulnihan Simsek and Sevval Nil Esirgun and Irem Dogan and Muhammed Furkan Dasdelen and Omer Faruk Durugol and Bastian Wittmann and Tamaz Amiranashvili and Enis Simsar and Mehmet Simsar and Emine Bensu Erdemir and Abdullah Alanbay and Anjany Sekuboyina and Berkan Lafci and Christian Bluethgen and Mehmet Kemal Ozdemir and Bjoern Menze},
      year={2024},
      eprint={2403.17834},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2403.17834}, 
}


@inproceedings{hamamci2024generatect,
      title={Generatect: Text-conditional generation of 3d chest ct volumes},
      author={Hamamci, Ibrahim Ethem and Er, Sezgin and Sekuboyina, Anjany and Simsar, Enis and Tezcan, Alperen and Simsek, Ayse Gulnihan and Esirgun, Sevval Nil and Almas, Furkan and Do{\u{g}}an, Irem and Dasdelen, Muhammed Furkan and others},
      booktitle={European Conference on Computer Vision},
      pages={126--143},
      year={2024},
      organization={Springer}
}

Contact¶

Technical questions: open an issue or post on the challenge forum. Other inquiries: use “Help → Email organizers” on the challenge site.

VLM3D Challenge – Task 4: Text‑Conditional CT Generation¶