Overview
Sinkove datasets contain synthetically generated medical images created using advanced diffusion models. Each dataset is customizable and designed for research, AI model training, and educational purposes.
Dataset Types
Medical Imaging Modalities
Sinkove supports generation of various medical imaging types:
- Chest X-rays - Most common modality with multiple specialized models
- Other Modalities - Additional imaging types available based on model selection
All datasets are delivered as ZIP archives containing:
- Images - High-resolution synthetic medical images (typically 512x512 pixels)
- Metadata - JSON files with image annotations and generation parameters
- Model Cards - Documentation describing the AI models used for generation
Creating Datasets
Configuration Options
When creating a dataset, you can specify:
Model Selection
- Choose from available generative models optimized for different scenarios
- Each model may specialize in specific conditions or imaging characteristics
Prompts and Descriptions
- Use clinical terminology for accurate generation
- Examples: “Severe cardiomegaly”, “Bilateral pneumonia”, “Normal chest X-ray”
- Multiple prompts can be used to create diverse datasets
Dataset Size
- Currently limited to approximately 1000 images per dataset
- Larger datasets require more generation time
Quality Settings
- Configure generation parameters for your specific research needs
- Balance between quality and generation speed
Start with smaller datasets (10-50 images) to test your configuration before
generating larger collections.
Dataset Status Lifecycle
Status Phases
- Pending - Dataset creation request is queued
- Processing - AI models are actively generating images
- Ready - Dataset is complete and available for download
- Failed - Generation encountered an error
- Expired - Dataset is no longer available (after retention period)
Monitoring Progress
Track your dataset status through:
- Web Dashboard - Real-time status updates in the platform
- Python SDK - Programmatic status checking
- API Endpoints - Direct REST API status queries
# Check dataset status
dataset = client.dataset("dataset_id")
print(f"Status: {dataset.status}")
# Wait for completion
while not dataset.ready():
time.sleep(30)
Best Practices
Research Applications
Model Training
- Use diverse prompts to create balanced training sets
- Include both normal and pathological cases
- Consider augmenting real datasets with synthetic data
Validation Studies
- Generate datasets with known ground truth for algorithm testing
- Create edge cases that are rare in real clinical data
- Test model robustness with synthetic variations
Educational Use
- Create teaching datasets with clear diagnostic features
- Generate rare conditions for student training
- Provide consistent, high-quality examples
Technical Considerations
Data Management
- Download datasets promptly to avoid expiration
- Organize datasets with clear naming conventions
- Maintain metadata for reproducibility
Quality Assurance
- Review sample images before generating large datasets
- Validate synthetic images meet your research requirements
- Consider expert review for clinical accuracy
Ethics and Compliance
- Use synthetic data responsibly in research contexts
- Clearly label synthetic data in publications
- Follow institutional guidelines for AI-generated content
Storage and Retention
Download Requirements
- Immediate Download - Download datasets upon completion
- Retention Period - Datasets may be automatically deleted after a specified time
- Storage Limits - Account storage quotas may apply
Local Management
# Organize downloaded datasets
import os
from datetime import datetime
def organize_dataset(dataset_id, download_path="./datasets"):
"""Download and organize dataset with timestamp"""
# Create organized directory structure
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
dataset_dir = os.path.join(download_path, f"{dataset_id}_{timestamp}")
os.makedirs(dataset_dir, exist_ok=True)
# Download dataset
dataset = client.dataset(dataset_id)
if dataset.ready():
filepath = os.path.join(dataset_dir, "dataset.zip")
dataset.save(filepath)
print(f"Dataset saved to: {filepath}")
return dataset_dir
Troubleshooting
Common Issues
Generation Failures
- Check prompt clarity and clinical accuracy
- Verify model compatibility with requested content
- Reduce dataset size if memory limitations occur
Download Problems
- Ensure stable internet connection for large files
- Verify sufficient local storage space
- Check API key permissions
Quality Issues
- Refine prompts with more specific medical terminology
- Try different models for better results
- Adjust generation parameters if available
Getting Support
For dataset-related issues:
- Review generation logs in your dashboard
- Contact support with specific dataset IDs
- Provide detailed descriptions of quality concerns