Reputation: 21
I am trying to load a json file. The content of the file is in Chinese language. I am getting UnicodeDecodeError:utf-8
. Is there any way to use try-except without losing all the content from the file?
def load_from_json(fin):
datas = []
for line in fin:
data = json.loads(line)
datas.append(data)
return datas
Screenshot of the error
Upvotes: 2
Views: 279
Reputation: 387
It does look like the file might not actually be utf8, so that is indeed a good place to start, as per the other answer. However, to answer your actual question,
Is there any way to use try-except without losing all the content from the file?
yes, there are two ways: one is that as well as setting encoding="utf8"
, set errors="replace"
. Then you'll get a Replacement Character U+FFFD (�) and things will continue as they were. You then try/except the json load and go from there. This is probably the simplest, but also not a very good solution for a long-term thing.
A better way would be to instead open the file in binary mode and do the decoding line by line, something like perhaps
def load_from_json(fin):
datas = []
for i, line in enumerate(fin):
try:
data = json.loads(line.decode("utf8"))
except UnicodeDecodeError as e:
print(f"line {i}, {line!r}: {e}", file=sys.stderr)
else:
datas.append(data)
return datas
Upvotes: 0
Reputation: 8086
This may potentially be an issue with character encodings. There is a library called ftfy (Fixed That For You) which may be able to autodetect and auto-fix your character encodings:
Upvotes: 1