Usage and Billing FAQs
Usage and Billing FAQs
What counts as a source?
Any data source from which data needs to be extracted is considered a source in Daton. For example - Facebook, Shopify etc are considered as a source.
What counts as an integration?
A source can have multiple integrations.
For example, when we consider Shopify as a source, and you have eCommerce sites in Shopify for multiple countries, and you want to replicate data from each Shopify store, then you will set up one integration each for each store in Shopify that you want to replicate data. Each integration will count towards your quota for integrations under the plan.
What counts as a replicated row?
Any row that is appended (inserted) or upserted (updated or overwritten) in the warehouse is considered as a row in Daton.
Daton supports multiple loading mechanisms and users have complete control over this behavior in most sources.
- Users can append data
- Users can un-nesting data and create child tables in the warehouse
Most common scenarios for replication:
1. A new row is a row that has never been replicated through Daton. This is a common occurrence when you use append mode for loading data.
2. An updated row is an existing row that has been changed in the warehouse. This is a common occurrence when you use Upserts mode for loading data.
3. A sub-row or un-nested row is a row that is created from de-nesting nested data structures. Child tables are created and kept up to date and count towards billing.
4. A copy of an existing row. For example:
- Rows in tables that are replicated in full during each replication job or
- Rows replicated as a result of resetting Replication Keys or
- Rows replicated as a result of resetting loading mode or
- Rows replicated as a result of re-loading data for any table.
How does Un-nesting impact billing?
Users have complete granular level control at a column-level in most sources on how to load data into the warehouse. The default behavior is to load data by un-nesting the data up to two levels within the JSON data provided by the source.
If you don't wish for columns in a particular integration, table, or a specific column to be un-nested, then you can use the un-nesting controls in the integration setup pages to control this behavior as per your requirement.
Why is my replicated rows count more than the data at source?
You may be surprised by the totals when viewing the number of replicated rows in Daton. You might think to yourself:
How did Daton manage to replicate so many rows when my source doesn't have so many rows and neither my destination?
This is a common question we get. Remember that row usage in Daton refers to the total number of replicated rows. This means that the number of rows in the source will not always be the same as the number of rows in Daton.
Common issues that impact usage
The number of rows Daton replicates is directly impacted by:
1. The number of tables set to replicate. The more tables that you select, the higher the potential for replicated rows. Select only the tables that you need for your downstream reporting. You can always come back and add more tables by editing your integration
2. The Replication Methods used for replicating tables. Tables using Full Table Replication can increase your row usage. Tables that are replicated in full are marked in the UI by the letter "F'. We recommend that you create a separate integration for tables that require full replication with a higher frequency.
3. Select the right replication frequency. Users often select a frequency of replication that is not required to support business requirements. We recommend that you evaluate your requirements and set up the frequency based on business needs.
4. Un-nesting nested columns- Daton unnests data prior to loading in to the data warehouse. If a nested column data is not required to support business requirements, we recommend that you de-select the column from the configuration pages to reduce the number of rows replicated.
How to identify full load/replication tables?
Tables that are loaded in full are marked by a "F" in the UI. Example:
What is the default Un-nesting behavior for each destination?
Warehouse |
Default Behaviour |
---|---|
Amazon Redshift |
Un-nest up to 2 levels |
Google BigQuery |
No Un-nesting |
Snowflake |
Un-nest up to 2 levels |
Oracle Autonomous Warehouse |
Un-nest up to 2 levels |
Amazon S3 |
Un-nest up to 2 levels |
All MySQL |
Un-nest up to 2 levels |
All PostgreSQL |
Un-nest up to 2 levels |
Google Cloud Storage |
Un-nest up to 2 levels |
For File-based destinations, additional files are created for each nested field. Refer to this document to understand how to query the parent and child tables created by Daton.
Managing and Optimizing Usage
While you can change your plan at any time to accommodate your changing data volume needs, below are some ways for reducing your utilization and staying within your plan’s row limits:
Identify integrations with high usage
Daton supports incremental replication for integrations whenever possible. However, there are cases where times when high volume of replication may be unavoidable or may even be necessary. For example:
- Data contains many nested structures.
- A source generates large amounts of data
- A high number of tables using Full Table Replication
- Integration doesn’t currently support table selection
- Advertising sources that have to tackle attribution problems
You can navigate to the specific integration and review the usage over a period of time and identify which tables are causing a high volume of replicated rows and take appropriate measures.
Selecting the right replication frequency
Users often select a frequency of replication that is not required to support business requirements. We recommend that you evaluate your requirements and set up the frequency based on business needs.
Replication Frequency setting applies at the integration level and not at the individual table level. Please bear this in mind when the integration has many tables that can only be replicated in Full with each replication job.
De-select data that is not needed
We often notice that users tend to select all the data available in the source when they set up their integrations. The common reason for it is to gain familiarity with the data. Regardless of what the reason is, we suggest that you periodically review your integrations to evaluate whether the data you are replicating is being used or not.
We suggest that you only select the data (tables and nested columns) that you need to meet your business requirements.
Users can edit the integrations and modify the setups at any time to reign in costs. Most integrations support table and column level selections.
Selecting replication keys for databases
For integrations that support the selection of Replication Keys, we recommend that you select the appropriate keys whenever possible. Not selecting any replication key may result in tables being replicated in full and thereby leading to an increase in the load on the source databases as well are your costs in Daton.
Pause integrations
You can pause an integration while you are trying to figure out how to optimize your setups.
Pausing an integration will prevent additional records from being extracted. If jobs are running when the integration is paused, these jobs will continue for that cycle and rows extracted as a part of these jobs will be counted towards your quota. A new set of jobs will not be triggered once the integration is paused.