The Sinkove Python SDK provides a simple, powerful interface for generating AI datasets programmatically. This guide covers everything you need to know to get started.

Prerequisites

Before using the SDK, ensure you have:
  • Python 3.12+ installed
  • API Key from the Sinkove dashboard
  • Organization ID (UUID format) from the Sinkove dashboard
API keys and Organization IDs are covered in the Get Started section.

Installation

pip install sinkove-sdk

Authentication

The SDK requires an API key for authentication:
import os
import uuid
from sinkove import Client

# Set API key as environment variable
os.environ['SINKOVE_API_KEY'] = 'your-api-key-here'

# Initialize client
client = Client(uuid.UUID("your-organization-id"))
Never hardcode API keys in your source code. Use environment variables or secure configuration management.

Basic Usage

import uuid
from sinkove import Client

# Initialize client
organization_id = uuid.UUID("your-organization-id")
model_id = uuid.UUID("your-model-id")
client = Client(organization_id)

# Create dataset
dataset = client.datasets.create(
    model_id=model_id,
    num_samples=20,
    args={"prompt": "chest x-ray showing pneumonia"}
)

# Wait for completion and download
dataset.wait()
dataset.download("medical_dataset.zip", strategy="replace")
print(f"Dataset {dataset.id} ready!")

Core Concepts

Client

The Client class is your main entry point to the SDK:
from sinkove import Client
import uuid

client = Client(uuid.UUID("your-organization-id"))

# Access organization properties
print(f"Organization ID: {client.id}")
print(f"Organization Name: {client.organization_name}")

Datasets

Datasets are the core resource in Sinkove. They go through several states during processing:
StateDescription
PENDINGDataset creation request received
STARTEDDataset generation in progress
READYDataset successfully generated and ready for download
FAILEDDataset generation failed

Dataset Operations

Creating Datasets

dataset = client.datasets.create(
    model_id=uuid.UUID("your-model-id"),
    num_samples=50,
    args={"prompt": "chest x-ray showing pneumonia"}
)
print(f"Created dataset: {dataset.id}")

Listing and Retrieving Datasets

# Get all datasets
datasets = client.datasets.list()

# Get specific dataset
dataset = client.datasets.get(uuid.UUID("dataset-id"))

# Check dataset properties
print(f"State: {dataset.state}")
print(f"Ready: {dataset.ready}")
print(f"Progress: {dataset.metadata.progress if dataset.metadata else 'N/A'}")

Downloading Datasets

The SDK provides flexible download options:
# Basic download (fails if file exists)
dataset.download("output.zip")

# Wait for completion then download
dataset.download("output.zip", wait=True, timeout=300)

# Download with file handling strategy
dataset.download("output.zip", strategy="replace")  # or "skip", "fail"

Monitoring Progress

# Wait for completion
dataset.wait(timeout=600)  # Wait max 10 minutes

# Manual status checking
if dataset.finished:
    if dataset.ready:
        print("Dataset is ready!")
    else:
        print(f"Failed with state: {dataset.state}")

Error Handling

Always implement proper error handling for production use:
try:
    client = Client(organization_id)
    dataset = client.datasets.create(model_id, 50, {"prompt": "chest x-ray"})
    dataset.wait(timeout=1800)  # 30 minutes
    dataset.download("dataset.zip", strategy="replace")

except ValueError as e:
    print(f"Configuration error: {e}")
except TimeoutError:
    print("Dataset generation timed out")
except Exception as e:
    print(f"Error: {e}")

Advanced Features

Custom API Endpoint

import os
os.environ['SINKOVE_API_URL'] = 'https://api.your-instance.com'
client = Client(organization_id)

Multiple Organizations

from sinkove.connector import Connector
from sinkove.organizations.client import OrganizationClient

connector = Connector(api_key="your-api-key")
org_client = OrganizationClient(connector)

# List all organizations
organizations = org_client.list()
for org in organizations:
    print(f"{org.organization_name}: {len(org.datasets.list())} datasets")

Best Practices

  • Security: Use environment variables for API keys
  • Error Handling: Implement timeouts and proper exception handling
  • Performance: Use parallel operations for multiple datasets
  • Resource Management: Clean up downloaded files and monitor disk space

Common Issues

ProblemSolution
ValueError: An API key is requiredSet SINKOVE_API_KEY environment variable
TimeoutError: Dataset processing timed outIncrease timeout or check dataset complexity
Exception: Failed to retrieve download URLEnsure dataset is in READY state

Next Steps