Jack
Jack

Reputation: 538

How can I decode a string read from file?

I read an file into a string in Python, and it shows up as encoded (not sure the encoding).

query = ""
with open(file_path) as f:
 for line in f.readlines():
   print(line)
   query += line
query

The lines all print out in English as expected

select * from table

but the query at the end shows up like

ÿþd\x00r\x00o\x00p\x00 \x00t\x00a\x00b\x00l\x00e\x00 

What's going on?

Upvotes: 0

Views: 1791

Answers (3)

Santhosh Boggarapu
Santhosh Boggarapu

Reputation: 14

with open(filePath) as f:
    fileContents =  f.read()
    if isinstance(fileContents, str):
        fileContents = fileContents.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.
    elif isinstance(fileContents, unicode):
        fileContents = fileContents.encode('ascii', 'ignore')

Upvotes: 0

Dima Tisnek
Dima Tisnek

Reputation: 11781

Agreed with Carlos, the encoding seems to be UTF-16LE. There seems to be BOM present, thus encoding="utf-16" would be able to autodetect if it's little- or big-endian.

Idiomatic Python would be:

with open(file_path, encoding="...") as f:
    for line in f:
        # do something with this line

In your case, you append each line to query, thus entire code can be reduced to:

query = open(file_path, encoding="...").read()

Upvotes: 3

Carlos
Carlos

Reputation: 81

It seems like UTF-16 data. Can you try decoding it with utf-16?

with open(file_path) as f:
    query=f.decode('utf-16')
print(query)

Upvotes: 2

Related Questions