Bava
Bava

Reputation: 73

Business Intelligence Datasource Performance - Large Table

I use Tableau and have a table with 140 fields. Due to the size/width of the table, the performance is poor. I would like to remove fields to increase reading speed, but my user base is so large, that at least one person uses each of the fields, while 90% use the same ~20 fields.

What is the best solution to this issue? (Tableau is our BI tool, BigQuery is our database)

What I have done thus far: In Tableau, it isn't clear how to user dynamic data sources that change based on the field selected. Ideally, I would like to have smaller views OR denormalized tables. As the users makes their selections in Tableau, the underlying data sources updates to the table or view with that field.

I have tried a simple version of a large view, but that performed worse than my large table, and read significantly more data (remember, I am BigQuery, so I care very much about bytes read due to costs)

Upvotes: 2

Views: 109

Answers (1)

MonteCarloSims
MonteCarloSims

Reputation: 1771

Suggestion 1: Extract your data.

Especially when it comes to datasources which are pay per query byte, (Big Query, Athena, Etc) extracts make a great deal of sense. Depending how 'fresh' the data must be for the users. (Of course all users will say 'live is the only way to go', but dig into this a little and see what it might actually be.) Refreshes can be scheduled for as little as 15 minutes. The real power of refreshes comes in the form of 'incremental refreshes' whereby only new records are added (along an index of int or date.) This is a great way to reduce costs - if your BigQuery database is partitioned - (which it should be.) Since Tableau Extracts are contained within .hyper files, a structure of Tableau's own design/control, they are extremely fast and optimized perfectly for use in Tableau.

Suggestion 2: Create 3 Data Sources (or more.) Certify these datasources after validating that they provide correct information. Provide users with with clear descriptions.

  1. Original Large Dataset.
  2. Subset of ~20 fields for the 90%.
  3. Remainder of fields for the 10%
  4. Extract of 1
  5. Extract of 2
  6. Extract of 3

Importantly, if field names match in each datasource (ie: not changed manually ever) then it should be easy for a user to 'scale up' to larger datasets as needed. This means that they could generally always start out with a small subset of data to begin their exploration, and then use the 'replace datasource' feature to switch to a different datasource while keeping their same views. (This wouldn't work as well if at all for scaling down, though.)

Upvotes: 3

Related Questions