Synthetic Data Engineer - Medical Imaging
Actively Interviewing
This organisation is scheduling interviews as applications come in. They're ready to hire as soon as they find the right person. Don't miss your opportunity, apply now!
MednTech
We provide an Android AI tool that analyzes cervical images to support frontline screening decisions.
Synthetic Data Engineer-Medical Imaging
About MedNTech
MednTech is a nonprofit organization developing AI-powered tools to improve early detection of cervical cancer and expand access to care in underserved communities, particularly in Africa. It works by equipping frontline healthcare providers with accessible technology and partnering with local health systems to improve outcomes and empower patients. We have already implemented our diagnostic model in real-world settings in Rwanda. We are now looking to further improve its performance and robustness before our next deployment, and are currently seeking volunteers to support this initiative.
Why Volunteer With Us
We are early-stage, moving fast, and building something that matters. You will have real ownership over technical decisions that directly shape a tool going into the field. Cervical cancer is one of the most preventable cancers in the world, yet it remains a leading cause of cancer death in low-resource settings simply because screening does not reach the people who need it. The tool you help build changes that. If you want your code to save lives, this is that opportunity!
Role Overview
We are looking for a data professional to design and implement synthetic data generation pipelines. These images will be used to augment limited cervical image datasets and improve the robustness of our cervical cancer screening model.
Key Responsibilities
- Develop synthetic image pipelines using diffusion models (Stable Diffusion, DreamBooth, LoRA) or GAN-based approaches
- Generate class-balanced datasets, particularly for underrepresented abnormal cases
- Design and run experiments comparing real vs synthetic vs hybrid datasets
- Evaluate synthetic data quality using FID, KID, and downstream model performance
- Implement prompt engineering and conditioning strategies for medically realistic outputs
- Perform domain gap analysis between real and synthetic data
- Collaborate with ML engineers to integrate synthetic data into training pipelines
Required Skills
- Experience with PyTorch and Hugging Face Diffusers or GAN frameworks
- Strong understanding of data augmentation vs synthetic generation tradeoffs
- Experience evaluating generative models (FID, distribution alignment)
- Familiarity with medical imaging challenges (preferred but not required)
Preferred Qualifications
-
Experience with GAN based models and LoRA fine-tuning pipelines
-
Prior work with imbalanced or medical datasets
-
Understanding of bias in synthetic data
-
All roles are highly collaborative and will work closely across the MednTech AI pipeline
-
Experience with healthcare AI, low-resource environments, or global health applications is a strong plus
-
Candidates should be comfortable working in fast-paced, early-stage environments
Minimum Hours per Week:
4-6 hours per week
Duration:
One-off project
We connect professionals with impact startups matching their causes, skills & schedule.
The client requests no contact from agencies or media sales.