Mohit Sharma
Mohit Sharma

Reputation: 189

How to correct the encoding while creating excel file from 'utf-8' data using python

I am trying to create an excel file using python from a list of dictionaries. Initially I was getting an error of improper encoding. So I decoded my data to 'utf-8' format. Now after the creation of excel, when I checked the values in each field, their format has been changed to text only. Below are the stpes I used while performing this activity with a snippet of code.

1.>I got error of improper encoding while creating excel file as my data had some 'ascii' values in it. Error snippet:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 16: ordinal not in range(128)

2.>To remove the error of improper encoding, I inserted a decode() function while reading my input csv file. Snippet of code while decoding to 'utf-8':

data = []
with open(datafile, "r") as f:
    header = f.readline().split(',')
    counter = 0
    for line in f:
        line = line.decode('utf-8')

        fields = line.split(',')
        entry = {}
        for i,value in enumerate(fields):
            entry[header[i].strip()] = value.strip()
        data.append(entry)
        counter += 1

return data

3.>After inserting decode() funtion, I created my excel file using below code:

ordered_list= dicts[0].keys() 

wb=Workbook("New File.xlsx")
ws=wb.add_worksheet("Unique")

first_row=0
for header in ordered_list:
   col=ordered_list.index(header)
   ws.write(first_row,col,header) 

row=1
for trans in dicts:
   for _key,_value in trans.items():
       col=ordered_list.index(_key)
       ws.write(row,col,_value)
   row+=1 #enter the next row
wb.close()

But after creation of excel, all the values in each field of excel is coming with text format and not their original format (some datetime values, decimal values etc.). How do I make sure to get that the data format does not change from the input data format I read using input csv file?

Upvotes: 2

Views: 18778

Answers (1)

Alastair McCormack
Alastair McCormack

Reputation: 27714

When reading text files you should pass the encoding to open() so that it's automatically decoded for you.

Python 2.x:

with io.open(datafile, "r", encoding="utf-8") as f:

Python 3.x:

with open(datafile, "r", encoding="utf-8") as f:

Now each line read will be a Unicode string.

As you're reading a CSV file, you may want to consider the CSV module, which understands CSV dialects. It will automatically return dictionaries per row, keyed by the header. In Python 3, it's just the csv module. In Python, the CSV module is broken with non-ASCII. Use https://pypi.python.org/pypi/unicodecsv/0.13.0

Once you have clean Unicode strings, you can proceed to store the data.

The Excel format requires that you tell it what kind of data you're storing. If you put a timestamp string into a cell, it will think it's just a string. The same applies if you insert a string of an integer.

Therefore, you need to convert the value type in Python before adding to the workbook.

Convert to decimal string to float:

 my_decimal = float(row["height"])
 ws.write(row,col,my_decimal)

Create datetime field from string. Assuming string is "Jun 1 2005 1:33PM":

date_object = datetime.strptime(my_date, '%b %d %Y %I:%M%p')
ws.write(row,col,date_object)

Upvotes: 1

Related Questions