VLM3D Challenge – Task 4: Text‑Conditional CT Generation¶
Welcome to Task 4 of the Vision‑Language Modeling in 3D Medical Imaging (VLM3D) Challenge. In this task, teams must synthesize realistic 3D chest CT volumes from free‑form radiology text prompts.
Contents¶
- Overview
- Dataset
- Task Objective
- Participation Rules
- Evaluation & Ranking
- Prizes & Publication
- Citation
- Contact
Overview¶
Synthetic 3D imaging unlocks:
- Data augmentation for scarce pathologies
- Privacy‑preserving sharing of realistic cases
- Pre‑training of downstream models
Task 4 challenges participants to convert clinical text (radiology reports) into high‑fidelity chest CT scans that faithfully reflect the described anatomy and pathology.
Dataset¶
Split | Patients | CT Volumes | Reports | Source |
---|---|---|---|---|
Train | 20 000 | ≈ 47 k | 20 000 | Istanbul Medipol University |
Validation | 1 304 | ≈ 3 k | 1564 | Istanbul Medipol University |
Internal Test | 2 000 | 2 000 | hidden | Istanbul Medipol University |
External Test | 1 024 | 1 024 | hidden | Boston University Hospital |
Each report is the conditioning prompt; each nifti volume is the target output.
Task Objective¶
Given a radiology report, generate a 3D nifti chest CT that:
- Matches anatomical context (lungs, mediastinum, pleura)
- Reflects all described pathologies (e.g., “right lower‑lobe nodule 5 mm”)
- Exhibits realistic Hounsfield distributions, spacing & slice thickness
Output volume must keep the same voxel spacing specified in the submission template.
Participation Rules¶
- Inference: Fully automatic – no manual editing.
- Training data: CT‑RATE + public data/models permitted.
- External masks: Allowed, but output must be a full CT volume.
- Submissions: One compressed archive per scan, max 1 run/day; last run counts.
- Organizers: Visible on leaderboard, not prize‑eligible.
Evaluation & Ranking¶
Generation Metrics¶
Metric | Role |
---|---|
FVD (CT‑Net) | CT‑specific Fréchet distance (anatomical realism) |
CT‑CLIP Score | Text‑image and image-image alignment |
FID | Global visual realism |
Metrics are averaged over the test set.
Final Ranking¶
- Compute all metrics above.
- For each metric, run a two‑sided permutation test (10 000 samples) between team pairs.
- Award 1 point per significant win; sum points across metrics.
- Order teams by total points (higher = better). Ties share the same rank.
Missing volumes receive the worst score for that scan.
Prizes & Publication¶
- Awards – details TBA.
- Every team with a valid submission will be invited to co‑author the joint challenge paper.
- An overview manuscript describing baseline results will appear on arXiv before the test phase closes.
Citation¶
@misc{hamamci2024foundation, title={Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography}, author={Ibrahim Ethem Hamamci and Sezgin Er and Furkan Almas and Ayse Gulnihan Simsek and Sevval Nil Esirgun and Irem Dogan and Muhammed Furkan Dasdelen and Omer Faruk Durugol and Bastian Wittmann and Tamaz Amiranashvili and Enis Simsar and Mehmet Simsar and Emine Bensu Erdemir and Abdullah Alanbay and Anjany Sekuboyina and Berkan Lafci and Christian Bluethgen and Mehmet Kemal Ozdemir and Bjoern Menze}, year={2024}, eprint={2403.17834}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2403.17834}, } @inproceedings{hamamci2024generatect, title={Generatect: Text-conditional generation of 3d chest ct volumes}, author={Hamamci, Ibrahim Ethem and Er, Sezgin and Sekuboyina, Anjany and Simsar, Enis and Tezcan, Alperen and Simsek, Ayse Gulnihan and Esirgun, Sevval Nil and Almas, Furkan and Do{\u{g}}an, Irem and Dasdelen, Muhammed Furkan and others}, booktitle={European Conference on Computer Vision}, pages={126--143}, year={2024}, organization={Springer} }
Contact¶
Technical questions: open an issue or post on the challenge forum. Other inquiries: use “Help → Email organizers” on the challenge site.