The Sinkove Python SDK provides a simple, powerful interface for generating AI datasets programmatically. This guide covers everything you need to know to get started.
Prerequisites
Before using the SDK, ensure you have:
Python 3.12+ installed
API Key from the Sinkove dashboard
Organization ID (UUID format) from the Sinkove dashboard
API keys and Organization IDs are covered in the Get
Started section.
Installation
Authentication
The SDK requires an API key for authentication:
Environment Variable (Recommended)
Direct Parameter
import os
import uuid
from sinkove import Client
# Set API key as environment variable
os.environ[ 'SINKOVE_API_KEY' ] = 'your-api-key-here'
# Initialize client
client = Client(uuid.UUID( "your-organization-id" ))
Never hardcode API keys in your source code. Use environment variables or
secure configuration management.
Basic Usage
import uuid
from sinkove import Client
# Initialize client
organization_id = uuid.UUID( "your-organization-id" )
model_id = uuid.UUID( "your-model-id" )
client = Client(organization_id)
# Create dataset
dataset = client.datasets.create(
model_id = model_id,
num_samples = 20 ,
args = { "prompt" : "chest x-ray showing pneumonia" }
)
# Wait for completion and download
dataset.wait()
dataset.download( "medical_dataset.zip" , strategy = "replace" )
print ( f "Dataset { dataset.id } ready!" )
Core Concepts
Client
The Client
class is your main entry point to the SDK:
from sinkove import Client
import uuid
client = Client(uuid.UUID( "your-organization-id" ))
# Access organization properties
print ( f "Organization ID: { client.id } " )
print ( f "Organization Name: { client.organization_name } " )
Datasets
Datasets are the core resource in Sinkove. They go through several states during processing:
State Description PENDING
Dataset creation request received STARTED
Dataset generation in progress READY
Dataset successfully generated and ready for download FAILED
Dataset generation failed
Dataset Operations
Creating Datasets
dataset = client.datasets.create(
model_id = uuid.UUID( "your-model-id" ),
num_samples = 50 ,
args = { "prompt" : "chest x-ray showing pneumonia" }
)
print ( f "Created dataset: { dataset.id } " )
Listing and Retrieving Datasets
# Get all datasets
datasets = client.datasets.list()
# Get specific dataset
dataset = client.datasets.get(uuid.UUID( "dataset-id" ))
# Check dataset properties
print ( f "State: { dataset.state } " )
print ( f "Ready: { dataset.ready } " )
print ( f "Progress: { dataset.metadata.progress if dataset.metadata else 'N/A' } " )
Downloading Datasets
The SDK provides flexible download options:
# Basic download (fails if file exists)
dataset.download( "output.zip" )
# Wait for completion then download
dataset.download( "output.zip" , wait = True , timeout = 300 )
# Download with file handling strategy
dataset.download( "output.zip" , strategy = "replace" ) # or "skip", "fail"
Monitoring Progress
# Wait for completion
dataset.wait( timeout = 600 ) # Wait max 10 minutes
# Manual status checking
if dataset.finished:
if dataset.ready:
print ( "Dataset is ready!" )
else :
print ( f "Failed with state: { dataset.state } " )
Error Handling
Always implement proper error handling for production use:
try :
client = Client(organization_id)
dataset = client.datasets.create(model_id, 50 , { "prompt" : "chest x-ray" })
dataset.wait( timeout = 1800 ) # 30 minutes
dataset.download( "dataset.zip" , strategy = "replace" )
except ValueError as e:
print ( f "Configuration error: { e } " )
except TimeoutError :
print ( "Dataset generation timed out" )
except Exception as e:
print ( f "Error: { e } " )
Advanced Features
Custom API Endpoint
import os
os.environ[ 'SINKOVE_API_URL' ] = 'https://api.your-instance.com'
client = Client(organization_id)
Multiple Organizations
from sinkove.connector import Connector
from sinkove.organizations.client import OrganizationClient
connector = Connector( api_key = "your-api-key" )
org_client = OrganizationClient(connector)
# List all organizations
organizations = org_client.list()
for org in organizations:
print ( f " { org.organization_name } : { len (org.datasets.list()) } datasets" )
Best Practices
Security : Use environment variables for API keys
Error Handling : Implement timeouts and proper exception handling
Performance : Use parallel operations for multiple datasets
Resource Management : Clean up downloaded files and monitor disk space
Common Issues
Problem Solution ValueError: An API key is required
Set SINKOVE_API_KEY
environment variable TimeoutError: Dataset processing timed out
Increase timeout or check dataset complexity Exception: Failed to retrieve download URL
Ensure dataset is in READY state
Next Steps