Subsetting (paid feature)

What is subsetting?

Subsetting is the process of extracting a part of your database - this could be a specific part, or a representative random sample. This makes it smaller and more manageable to work with, while maintaining all the relationships within the data that are required for integrity and correctness.

Subsetting keeps your data on your own infrastructure.

While subsetting uses an API to plan the subsetting run, you run databaseci your own hardware, within your own infrastructure. This means you never need to send your data to DatabaseCI's servers, or even share your database connection details with us. The only thing DatabaseCI sees of your data is the structure of your database - never the actual content.

Easy, flexible config

Configure snapshotting configurations with a simple yaml file:

real_db_url: postgresql://server/production
copy_db_url: postgresql:///localsubset

public:
  order:
    sample_percent: 5

  product_order:
    backwards: true

Or incorporate subsetting into your scripts:

subsetting_config = dict(
    real_db_url=...
    copy_db_url=...
    public=dict(
        order=dict(sample_percent=5)
        product_order=dict(backwards=True)
    ),
)

subset(
    subsetting_config
)

Subsetting is a paid service included as part of a databaseci commercial subscription. For more details, or to sign up, see Licensing and pricing.