Reputation: 31
import pandas as pd
import lzma
df = pd.read_csv('final.csv', headers = None)
with open('/xzfolder/final.xz', 'wb') as f:
f.write(lzma.compress(df.to_records(index=False), format=lzma.FORMAT_XZ))
df = pd.read_csv('/xzfolder/final.xz', headers = None)
Above is my code. I am compressing my csv using lzma...but when I read compressed file I get UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 8: invalid continuation byte
Upvotes: 1
Views: 596
Reputation: 4879
I tried your code and faced the same error. I also tried to "unxz" the created file using a command line utility (xz on linux) but even that seemed to be giving out garbage - indicating that there is something wrong with the file creation.
I changed the code to use .to_string().encode()
- thereby forcing a bytes object and it works
import lzma
import pandas as pd
df = pd.read_csv('somefile.txt', header=None)
with open('somez.xz', 'wb') as f:
f.write(lzma.compress(df.to_string().encode()
, format=lzma.FORMAT_XZ))
df_re = pd.read_csv('somez.xz')
Upvotes: 2