Usage and Billing FAQs

What counts as a source?

Any data source from which data needs to be extracted is considered a source in Daton. For example - Facebook, Shopify etc are considered as a source.

What counts as an integration?

A source can have multiple integrations.

For example, when we consider Shopify as a source, and you have eCommerce sites in Shopify for multiple countries, and you want to replicate data from each Shopify store, then you will set up one integration each for each store in Shopify that you want to replicate data. Each integration will count towards your quota for integrations under the plan.

What counts as a replicated row?

Any row that is appended (inserted) or upserted (updated or overwritten) in the warehouse is considered as a row in Daton.

Daton supports multiple loading mechanisms and users have complete control over this behavior in most sources.

Users can append data
Users can un-nesting data and create child tables in the warehouse

Most common scenarios for replication:

1. A new row is a row that has never been replicated through Daton. This is a common occurrence when you use append mode for loading data.

2. An updated row is an existing row that has been changed in the warehouse. This is a common occurrence when you use Upserts mode for loading data.

3. A sub-row or un-nested row is a row that is created from de-nesting nested data structures. Child tables are created and kept up to date and count towards billing.

4. A copy of an existing row. For example:

Rows in tables that are replicated in full during each replication job or
Rows replicated as a result of resetting Replication Keys or
Rows replicated as a result of resetting loading mode or
Rows replicated as a result of re-loading data for any table.

How does Un-nesting impact billing?

Users have complete granular level control at a column-level in most sources on how to load data into the warehouse. The default behavior is to load data by un-nesting the data up to two levels within the JSON data provided by the source.

If you don't wish for columns in a particular integration, table, or a specific column to be un-nested, then you can use the un-nesting controls in the integration setup pages to control this behavior as per your requirement.

Why is my replicated rows count more than the data at source?

You may be surprised by the totals when viewing the number of replicated rows in Daton. You might think to yourself:

How did Daton manage to replicate so many rows when my source doesn't have so many rows and neither my destination?

This is a common question we get. Remember that row usage in Daton refers to the total number of replicated rows. This means that the number of rows in the source will not always be the same as the number of rows in Daton.

Common issues that impact usage

The number of rows Daton replicates is directly impacted by:

1. The number of tables set to replicate. The more tables that you select, the higher the potential for replicated rows. Select only the tables that you need for your downstream reporting. You can always come back and add more tables by editing your integration

2. The Replication Methods used for replicating tables. Tables using Full Table Replication can increase your row usage. Tables that are replicated in full are marked in the UI by the letter "F'. We recommend that you create a separate integration for tables that require full replication with a higher frequency.

3. Select the right replication frequency. Users often select a frequency of replication that is not required to support business requirements. We recommend that you evaluate your requirements and set up the frequency based on business needs.

4. Un-nesting nested columns- Daton unnests data prior to loading in to the data warehouse. If a nested column data is not required to support business requirements, we recommend that you de-select the column from the configuration pages to reduce the number of rows replicated.

How to identify full load/replication tables?

Tables that are loaded in full are marked by a "F" in the UI. Example:

What is the default Un-nesting behavior for each destination?

Warehouse	Default Behaviour
Amazon Redshift	Un-nest up to 2 levels
Google BigQuery	No Un-nesting
Snowflake	Un-nest up to 2 levels
Oracle Autonomous Warehouse	Un-nest up to 2 levels
Amazon S3	Un-nest up to 2 levels
All MySQL	Un-nest up to 2 levels
All PostgreSQL	Un-nest up to 2 levels
Google Cloud Storage	Un-nest up to 2 levels

For File-based destinations, additional files are created for each nested field. Refer to this document to understand how to query the parent and child tables created by Daton.

Managing and Optimizing Usage

While you can change your plan at any time to accommodate your changing data volume needs, below are some ways for reducing your utilization and staying within your plan’s row limits:

Identify integrations with high usage
Daton supports incremental replication for integrations whenever possible. However, there are cases where times when high volume of replication may be unavoidable or may even be necessary. For example:

Data contains many nested structures.
A source generates large amounts of data
A high number of tables using Full Table Replication
Integration doesn’t currently support table selection
Advertising sources that have to tackle attribution problems

You can navigate to the specific integration and review the usage over a period of time and identify which tables are causing a high volume of replicated rows and take appropriate measures.

Selecting the right replication frequency

Replication Frequency setting applies at the integration level and not at the individual table level. Please bear this in mind when the integration has many tables that can only be replicated in Full with each replication job.

De-select data that is not needed

We often notice that users tend to select all the data available in the source when they set up their integrations. The common reason for it is to gain familiarity with the data. Regardless of what the reason is, we suggest that you periodically review your integrations to evaluate whether the data you are replicating is being used or not.

We suggest that you only select the data (tables and nested columns) that you need to meet your business requirements.

Users can edit the integrations and modify the setups at any time to reign in costs. Most integrations support table and column level selections.

Selecting replication keys for databases

For integrations that support the selection of Replication Keys, we recommend that you select the appropriate keys whenever possible. Not selecting any replication key may result in tables being replicated in full and thereby leading to an increase in the load on the source databases as well are your costs in Daton.

Pause integrations
You can pause an integration while you are trying to figure out how to optimize your setups.

Pausing an integration will prevent additional records from being extracted. If jobs are running when the integration is paused, these jobs will continue for that cycle and rows extracted as a part of these jobs will be counted towards your quota. A new set of jobs will not be triggered once the integration is paused.

Table of Contents

What counts as a source? What counts as an integration? What counts as a replicated row? Most common scenarios for replication: How does Un-nesting impact billing? Why is my replicated rows count more than the data at source? Common issues that impact usage How to identify full load/replication tables? What is the default Un-nesting behavior for each destination? Managing and Optimizing Usage

Usage and Billing FAQs

Saras Pulse

Saras Daton

Saras IQ