Orysya Stus
Orysya Stus

Reputation: 91

Working with huge csv files in Tableau

I have a large csv files (1000 rows x 70,000 columns) which I want to create a union between 2 smaller csv files (since these csv files will be updated in the future). In Tableau working with such a large csv file results in very long processing time and sometimes causes Tableau to stop responding. I would like to know what are better ways of dealing with such large csv files ie. by splitting data, converting csv to other data file type, connecting to server, etc. Please let me know.

Upvotes: 0

Views: 4531

Answers (2)

matt_black
matt_black

Reputation: 1330

The problem isn't the size of the csv file: it is the structure. Almost anything trying to digest a csv will expect lots of rows but not many columns. Usually columns define the type of data (eg customer number, transaction value, transaction count, date...) and the rows define instances of the data (all the values for an individual transaction).

Tableau can happily cope with hundreds (maybe even thousands) of columns and millions of rows (i've happily ingested 25 million row CSVs).

Very wide tables usually emerge because you have a "pivoted" analysis with one set of data categories along the columns and another along the rows. For effective analysis you need to undo the pivoting (or derive the data from its source unpivoted). Cycle through the complete table (you can even do this in Excel VBA despite the number of columns by reading the CSV directly line by line rather than opening the file). Convert the first row (which is probably column headings) into a new column (so each new row contains every combination of original row label and each column header plus the relevant data value from the relevant cell in the CSV file). The new table will be 3 columns wide but with all the data from the CSV (assuming the CSV was structured the way I assumed). If I've misunderstood the structure of the file, you have a much bigger problem than I thought!

Upvotes: 0

Nick
Nick

Reputation: 7451

The first thing you should ensure is that you are accessing the file locally and not over a network. Sometimes it is minor, but in some cases that can cause some major slow down in Tableau reading the file.

Beyond that, your file is pretty wide should be normalized some, so that you get more row and fewer columns. Tableau will most likely read it in faster because it has fewer columns to analyze (data types, etc).

If you don't know how to normalize the CSV file, you can use a tool like: http://www.convertcsv.com/pivot-csv.htm

Once you have the file normalized and connected in Tableau, you may want to extract it inside of Tableau for improved performance and file compression.

Upvotes: 1

Related Questions