Choose the rows you want
There are two ways to specify rows you want:
- Random sample: DatabaseCI picks a particular percentage of rows in a table, chosen at random
- Where clause: Enter an SQL
where ...clause for a table, to pick the exact rows you are interested in.
Or you can use both at the same time to get a union of the two criteria.
Rows in other tables relating to the rows you've chosen are always included. DatabaseCI does the tricky work for you of figuring out all the dependencies of foreign keys and including them.
For instance, if your
transactions table includes a
customer_id referencing the
customer table, those customers will also get included automatically.
DatabaseCI can also follow following foreign keys backwards - as might be necessary to handle a many-to-many relationship. See below.
Zero rows by default
If you don't specify a selection criteria for a particular table, no rows will be selected for that table - this is to keep the size of your subsets small by default.
Of course, rows chosen because because they are related to other rows you've chosen will still be selected (see immediately above).
The hierarchy of table_config is schemas->[schema_name]->[table_name], as in the following example:
api_key: c72cc5a97000d49bb33abd48ac730cc2 real_db_url: postgres://production.server.example.com/production copy_db_url: postgres:///subsetted schemas: public: order: where: 'ordered_at > current_date - 7' product: random_sample: 10
This is simply percentage of rows from 0 to 100 - "10" for a million row table will result in (approximately) 10 000 rows being chosen for the snapshot.
Rows referenced by those 10 000 rows will be included too.
To leave that table empty, just enter 0. And to select every row no matter what, enter 100.
public: users: sample_percent: 10
To include specific rows, just include the
where criteria for those rows, for instance to make a snapshot that includes only users named Alex, enter the following:
public: users: where: "firstname = 'Alex'"
Sometimes you may be interested in following foreign keys "backwards". A common scenario might involve a many-to-many relationship table.
For instance, a
order table linked by a
product_order many-to-many relationship table.
Foreign keys "flow" outwards from such a table, but you likely want to choose particular orders or products, and follow those relationships "inwards" from one base (order/customer) table towards the relationship table, then back "outwards" to the other table.
Such relationships aren't followed by default as they can lead to an explosion in the quantity of selected rows - but to enable them for a table, simply specify
backwards: true on the table containing the relevant foreign key column, like so:
public: order: sample_percent: 5 product_order: backwards: true
With this information, you should be able to tailor your configuration to get exactly subsets of exactly the shapes and sizes you're interested in.