Presen
Presen

Reputation: 1857

Opening huge text file, unicode issue

I'm trying to open a text file and print it's first line.

My code is:

dataFile = open('data/AllData_2000001_3000000.txt', 'r', encoding="latin-1")
print(dataFile.read(1000))

The input is

The bug is hitting

My output is

ÿþT h e  b u g  i s  h i t t i n g

also iso-8859-1 give the same result.
When I try utf-8 I'm getting the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

What is my mistake?
Thanks!

Upvotes: 1

Views: 3523

Answers (1)

roippi
roippi

Reputation: 25954

That ÿþ is likely the BOM in a UTF-16 file. Try specifying that as your encoding when opening it.

Upvotes: 6

Related Questions