Simon
Simon

Reputation: 31

Pandas throwing error on compressed file (xz)

import pandas as pd
import lzma

df = pd.read_csv('final.csv', headers = None)

with open('/xzfolder/final.xz', 'wb') as f:
    f.write(lzma.compress(df.to_records(index=False), format=lzma.FORMAT_XZ))    

df = pd.read_csv('/xzfolder/final.xz', headers = None)

Above is my code. I am compressing my csv using lzma...but when I read compressed file I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 8: invalid continuation byte

Upvotes: 1

Views: 596

Answers (1)

Mortz
Mortz

Reputation: 4879

I tried your code and faced the same error. I also tried to "unxz" the created file using a command line utility (xz on linux) but even that seemed to be giving out garbage - indicating that there is something wrong with the file creation.

I changed the code to use .to_string().encode() - thereby forcing a bytes object and it works

import lzma
import pandas as pd
df = pd.read_csv('somefile.txt', header=None)
with open('somez.xz', 'wb') as f:
            f.write(lzma.compress(df.to_string().encode()
           , format=lzma.FORMAT_XZ))

df_re = pd.read_csv('somez.xz')

Upvotes: 2

Related Questions