Erlinska
Erlinska

Reputation: 433

Reading gzip file with pandas doesn't work

I've trambled into a small issue with the pandas pd.read_csv function:

I've downloaded very large amounts of data in the form of csv.gzip files, and i'd rather let them compressed on my computer, because of the tremendous amount of space they take.

I want to load them into python, to do so, I've been using the usual pd.read_csv function, adding the compression='gzip' argument, while pandas manages to read the csv with the correct amount of columns and the correct index length, the data is complety buggy:

tick = pd.read_csv("D:\Finance python\Data\EUR_USD\Tick\\2015\\1.csv.gz",compression='gzip')

tick.head()
Out[30]: 
    D  Unnamed: 1  Unnamed: 2
0 NaN         NaN         NaN
1 NaN         NaN         NaN
2 NaN         NaN         NaN
3 NaN         NaN         NaN
4 NaN         NaN         NaN

Would anyone have an idea of what I'm doing wrong when I try to read the file?

Pandas clearly recognizes that the data is in gzip form, but I have no idea of why it doesn't manages to extract it correctly.^

Thanks

The data that I'm trying to read: https://tickdata.fxcorporate.com/EURUSD/2015/1.csv.gz

Upvotes: 2

Views: 2988

Answers (1)

etopylight
etopylight

Reputation: 1319

A quick look into the original csv file shows that it contains null characters ^@ which is why pandas cannot parse it correctly

You can cleanup those characaters by using shell command

gzip -dc 1.csv.gz | tr -d '\0' | gzip > 1_clean.csv.gz
  • gzip -dc decompresses the file into stdout
  • tr -d '\0' deletes the null characters
  • gzip compresses it back to a gzipped file

After that pandas should be able to read it correctly


UPDATE

In case when you don't have access to shell, you can still use python to do the trick, although it would be slower

import gzip

with gzip.open('1.csv.gz', 'rb') as f:
    data = f.read()

with gzip.open('1_clean.csv.gz', 'wb') as f:
    f.write(data.decode('utf-8').replace('\x00', '').encode('utf-8'))

Upvotes: 1

Related Questions