Reputation: 538
I read an file into a string in Python, and it shows up as encoded (not sure the encoding).
query = ""
with open(file_path) as f:
for line in f.readlines():
print(line)
query += line
query
The lines all print out in English as expected
select * from table
but the query at the end shows up like
ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00
What's going on?
Upvotes: 0
Views: 1791
Reputation: 14
with open(filePath) as f:
fileContents = f.read()
if isinstance(fileContents, str):
fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
elif isinstance(fileContents, unicode):
fileContents = fileContents.encode('ascii', 'ignore')
Upvotes: 0
Reputation: 11781
Agreed with Carlos, the encoding seems to be UTF-16LE. There seems to be BOM present, thus encoding="utf-16"
would be able to autodetect if it's little- or big-endian.
Idiomatic Python would be:
with open(file_path, encoding="...") as f:
for line in f:
# do something with this line
In your case, you append each line to query, thus entire code can be reduced to:
query = open(file_path, encoding="...").read()
Upvotes: 3
Reputation: 81
It seems like UTF-16 data. Can you try decoding it with utf-16?
with open(file_path) as f:
query=f.decode('utf-16')
print(query)
Upvotes: 2