Raj
Raj

Reputation: 223

Python Throwing "'utf8' codec can't decode byte 0xd0 in position 0" Error

I am trying to load a currently existing worksheet and import the text file (comma separated values) screenshot shown below,

Excel Sheet:

enter image description here

Text File:

enter image description here

I am using the code shown below:

# importing necessary modules for performing the required operation
    import glob
    import csv
    from openpyxl import load_workbook
    import xlwt

    #read the text file(s) using the CSV modules and read the dilimiters and quoutechar
    for filename in glob.glob("E:\Scripting_Test\Phase1\*.txt"):
        spamReader = csv.reader((open(filename, 'rb')), delimiter=',')


        #read the excel file and using xlwt modules and set the active sheet
        wb = load_workbook(filename=r"E:\Scripting_Test\SeqTem\Seq0001.xls")
        ws = wb.worksheets(0)


        #write the data that is in text file to excel file
        for rowx, row in enumerate(spamReader):
            for colx, value in enumerate(row):
                ws.write(rowx, colx, value)

        wb.save()

I am getting a following error message:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

One more question: How can you tell python to import the text data starting from A3 column in the excel sheet?

Upvotes: 12

Views: 20134

Answers (3)

Wonil
Wonil

Reputation: 6727

openpyxl only deals with OOXML format (xlsx/xlsm). Please try to save as xlsx file format instead of xls by using Excel.

If you want to convert xls file to xlsx in codes. Please try one option from the below list:

  1. In Windows, you can also use excelcnv tool to convert xls to xlxx.
  2. In Linux, please check this article.
  3. Or, you could convert to xlsx by using xlrd in Python. Please check this Q&A.

Upvotes: 2

Max Resnick
Max Resnick

Reputation: 33

Hi Are you sure you don't have a doc that has UTF-8 BOM

You might try using with UTF-8 BOM codec. Generally Windows+UTF+8 can be a bit troublesome. Although that character that it's showing may not be the BOM.

Upvotes: 1

Adam Morris
Adam Morris

Reputation: 8545

Unicode encoding confuses me, but can't you force the value to ignore invalid bytes by saying:

value = unicode(value, errors='ignore')

Here is a great answer for more reading on unicode: unicode().decode('utf-8', 'ignore') raising UnicodeEncodeError

Upvotes: 4

Related Questions