Reputation: 95
What performs better (what returns queries faster) with Tableau (a read-only program) when Tableau is connected to tables of data through SQL Server? Multiple tall, thin tables that are joined or a single short and wide table?
The tall and thin tables have many rows but few columns and are joined. The short and wide table has fewer rows, but more columns.
I believe the tall and thin option returns queries faster because there is less redundant data, less columns (creates faster indexing), less NULLS, and less indexing (because there's less columns), but I need at least a second opinion, so please let me know yours.
The reason I'm interested in this question is to improve the query performance by our clients when there query our server for data to render their visualizations.
Upvotes: 0
Views: 885
Reputation: 95
After lots of researching, I've found a general answer. Generally, and especially with SQL Server and Tableau, you want to steer towards normalizing your tables, so you can avoid redundant data and thus, your table has less data to scan, making it's queries faster to execute. However, you don't want to normalize your tables to a point where the joins between the tables actually cause the query to take longer than if the query was just being send to one short,wide table. Ultimately, you're just going to have to test to see what amount of normalization/denormalization is best for the quickest query return.
Upvotes: 0
Reputation: 2275
It depends largely on what you're trying to achieve. For some applications, it's better to have fewer entries with many fields, and for others it's better to have many entries with fewer fields.
Keep in mind that Tableau is not like Excel nor SQL, meaning, you should keep data manipulation to a minimum as some calculations are not easy/possible to be done in Tableau (and some are possible but involves exporting data and reconnecting to it). Tableau should be used mostly for data visualization purposes
Additionally, it's very troublesome to compare different measures in the same chart. Meaning, if you want to compare sum(A) to sum(B), you'll have to plot 2 different charts (and not put both in the same). I find it easier to have few measure fields and lots of dimensions. That way I can easily slice/compare measures. In the last example, instead of having 1 entry with A and B measures, I would have 2 entries, one with A measure and one dimension (saying it's A that is being measured) and one with B measure and one dimension (in the same respectively fields)
BUT that doesn't mean you should go always with "tall thin tables". You need to see what you're trying to achieve and what format better suits your needs (and Tableau design). And unless you're working with really big tables and your analysis are done many times a day (or real time) and performance is a very big issue, then you should focus in what makes your life easier (specially when you have to change and adapt analysis later on).
And for performance, in Tableau I follow 3 rules:
1) Always extract (data to a tde) - it's way faster than most of other database format (I didn't test all, but it's way faster the csv,mdb, xls or SQL connected directly)
2) Never use Tableau links - Unless it does not affect performance (e.g., nomenclature for a low range field) it's better that all your information is already in the same database
3) Remove the thrash - It's very appealing having all information possible in a database, but it also comes at a performance cost. I try to keep only the information necessary for the analysis, to the limits of flexibility I need. Filtering the data is ok, putting the filter in context is better, but filtering on the extract or in the data source itself is the best solution
Upvotes: 1