Reputation: 471
i have a csv file that is generated by exporting a Tableau table to csv, but I can not manage to open it in Python.
I have tried to use pd.read_csv but that fails.
import pandas as pd
#path to file
path = "tableau_crosstab.csv"
data = pd.read_csv(path, encoding="ISO-8859-1")
This works for reading in the file, but the result is just a number of rows with one character per row, and some weird characters in the head of the frame.
ÿþd
o
m
a
i
and so on. When I try to import the file in Excel I have to select tab as separator, but when I trie that here it fails
import pandas as pd
#path to file
path = "tableau_crosstab.csv"
data = pd.read_csv(path, encoding="ISO-8859-1", sep='\t')
CParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 2
I did try to open the file with codecs, and then it says the encoding is 'cp1252', but using that as the encoding fails too.
I also tried to read it in using utf-8 and that also fails. I am running out of ideas for how to solve this.
Here is a link to where a copy if the file is if someone could take a look http://www.mediafire.com/file/6dtxo2deczwy3u2/tableau_crosstab.csv
Upvotes: 3
Views: 2888
Reputation: 393923
You have unicode BOM specifically utf-16LE
try
data = pd.read_csv(path, encoding="utf-16", sep='\t')
the funny characters you see: ÿþ
corresponds to the hex FF FE
which is the unicode-16 little endian byte order mark. If you see the wikipedia page it shows all the various byte order marks
I get the following when reading your csv:
In[4]:
data = pd.read_csv(r'C:\tableau_crosstab.csv', encoding='utf-16', sep='\t')
data
Out[4]:
domain Month of date impressions clicks
0 test1.no jun.17 725 676 633
1 test1.no mai.17 422 995 456
2 test1.no apr.17 241 102 316
3 test1.no mar.17 295 157 260
4 test1.no feb.17 122 902 198
5 test1.no jan.17 137 972 201
6 test1.no des.16 274 435 361
7 test2.com jun.17 3 083 373 1 638
8 test2.com mai.17 3 370 620 2 036
9 test2.com apr.17 2 388 933 1 483
10 test2.com mar.17 2 410 675 1 581
11 test2.com feb.17 2 311 952 1 682
12 test2.com jan.17 1 184 787 874
13 test2.com des.16 2 118 594 1 738
14 test3.com jun.17 411 456 41
15 test3.com mai.17 342 048 87
16 test3.com apr.17 197 058 108
17 test3.com mar.17 288 949 156
18 test3.com feb.17 230 970 130
19 test3.com jan.17 388 032 115
20 test3.com des.16 1 693 442 166
21 test4.no jun.17 521 790 683
22 test4.no mai.17 438 037 541
23 test4.no apr.17 618 282 1 042
24 test4.no mar.17 576 413 956
25 test4.no feb.17 451 248 636
26 test4.no jan.17 293 217 471
27 test4.no des.16 641 491 978
Upvotes: 6