Configuring snapshots
Choose the rows you want
There are two ways to specify rows you want:
- Random sample: DatabaseCI picks a particular percentage of rows in a table, chosen at random
- Where clause: Enter an SQL
where ...
clause for a table, to pick the exact rows you are interested in.
Or you can use both at the same time to get a union of the two criteria.
Handling relationships
Rows in other tables relating to the rows you've chosen are always included. DatabaseCI does the tricky work for you of figuring out all the dependencies of foreign keys and including them.
For instance, if your transactions
table includes a customer_id
referencing the customer
table, those customers will also get included automatically.
DatabaseCI can also follow following foreign keys backwards - as might be necessary to handle a many-to-many relationship. See below.
Zero rows by default
If you don't specify a selection criteria for a particular table, no rows will be selected for that table - this is to keep the size of your subsets small by default.
Of course, rows chosen because because they are related to other rows you've chosen will still be selected (see immediately above).
Config layout
The hierarchy of table_config is schemas->[schema_name]->[table_name], as in the following example:
api_key: c72cc5a97000d49bb33abd48ac730cc2
real_db_url: postgres://production.server.example.com/production
copy_db_url: postgres:///subsetted
schemas:
public:
order:
where: 'ordered_at > current_date - 7'
product:
random_sample: 10
Random sampling
This is simply percentage of rows from 0 to 100 - "10" for a million row table will result in (approximately) 10 000 rows being chosen for the snapshot.
Rows referenced by those 10 000 rows will be included too.
To leave that table empty, just enter 0. And to select every row no matter what, enter 100.
public:
users:
sample_percent: 10
where
clause
To include specific rows, just include the where
criteria for those rows, for instance to make a snapshot that includes only users named Alex, enter the following:
public:
users:
where: "firstname = 'Alex'"
Backwards/many-to-many relationships
Sometimes you may be interested in following foreign keys "backwards". A common scenario might involve a many-to-many relationship table.
For instance, a product
and order
table linked by a product_order
many-to-many relationship table.
Foreign keys "flow" outwards from such a table, but you likely want to choose particular orders or products, and follow those relationships "inwards" from one base (order/customer) table towards the relationship table, then back "outwards" to the other table.
Such relationships aren't followed by default as they can lead to an explosion in the quantity of selected rows - but to enable them for a table, simply specify backwards: true
on the table containing the relevant foreign key column, like so:
public:
order:
sample_percent: 5
product_order:
backwards: true
With this information, you should be able to tailor your configuration to get exactly subsets of exactly the shapes and sizes you're interested in.