Reputation: 143
I am trying to execute this code snippet in python 3.8
def load_rightprob(self, rightprob_file):
''' dictionary with # people keys with # actions '''
rightProb = {}
for line in open(rightprob_file):
items = line.strip().split("\t")
if len(items) != len(self.action_qid_dict) + 1:
continue
pid = int(items[0])
but I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
I tried for line in open(rightprob_file, **'rb'**):
instead but I get challenges on the following line with this error:
TypeError: a bytes-like object is required, not 'str'
Can somebody please suggest how to fix this? I am reading from a .txt file where each line is an ID, followed by 377 columns representing probability values associated with this ID
Upvotes: 0
Views: 1053
Reputation: 308530
It's very unusual for a text file to start with 0xff
. Because of that, it's sometimes placed deliberately at the start of the file as part of a Byte Order Mark (BOM) for Unicode, particularly on Windows. As you can see in the table in the link, only two Unicode encodings have a BOM that starts with 0xff
: UTF-16 or UTF-32, both little endian. Of the two UTF-16 is far more commonly encountered.
So open your file like this:
with open(rightprob_file, 'r', encoding='utf_16_le') as f:
for line in f:
I added the with
so that the file would be automatically closed when you're done, that was a bug in your original code.
The first character read from the file will be u'\ufeff'
and can be thrown away or otherwise ignored.
Upvotes: 1