Mukesh Suthar
Mukesh Suthar

Reputation: 69

Getting UnicodeDecodeError while reading excel in Tornado,Python

i'm using postman to send an excel file which i am reading in tornado.


Tornado code

self.request.files['1'][0]['body'].decode()

here if i send .csv than, the above code works.


if i send .xlsx file than i am stuck with this error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 10: invalid start byte


request.files will fetch the file but the type would be byte. so to convert byte to str i've used decode(), which works only for .csv and not for .xlsx

i tried decode('utf-8') but still no luck.

i've tried searching but didn't find any issue mentioning 0x87 problem?

Upvotes: 0

Views: 1625

Answers (3)

Shantanu Verma
Shantanu Verma

Reputation: 11

I faced the same issue and this worked for me.

    import io
    
    df = pd.read_excel(io.BytesIO(self.request.files['1'][0]['body']))

Upvotes: 1

xyres
xyres

Reputation: 21854

The reason is that the .xlsx file has a different encoding, not utf-8. You'll need to use the original encoding to decode the file.

There's no guaranteed way of finding out the encoding of a file programmatically. I'm guessing you're making this application for general users and so you will keep encountering files with different and unexpected encodings.

A good way to deal with this is by trying to decode using multiple encodings, in case one fails. Example:

encodings = ['utf-8', 'iso-8859-1', 'windows-1251', 'windows-1252']

for encoding in encodings:
    try:
        decoded_file = self.request.files['1'][0]['body'].decode(encoding)
    except UnicodeDecodeError:
        # this will run when the current encoding fails
        # just ignore the error and try the next one
        pass
    else:
        # this will run when an encoding passes
        # break the loop
        # it is also a good idea to re-encode the 
        # decoded files to utf-8 for your purpose
        decoded_file = decoded_file.encode("utf8")
        break
else:
    # this will run when the for loop ends
    # without successfully decoding the file
    # now you can return an error message
    # to the user asking them to change 
    # the file encoding and re upload
    self.write("Error: Unidentified file encoding. Re-upload with UTF-8 encoding")
    return

# when the program reaches here, it means 
# you have successfully decoded the file 
# and you can access it from `decoded_file` variable

Here's a list of some common encodings: What is the most common encoding of each language?

Upvotes: 1

sudonym
sudonym

Reputation: 4028

try this one, following suggestions provided here:

self.request.files['1'][0]['body'].decode('iso-8859-1').encode('utf-8')

Upvotes: 0

Related Questions