csv module writes time as decimal

Question

I am running into an issue where I have some data in a .xls file(example below).

  A            B           C         D         E        F
John Smith     8:00AM      9:00AM    10:00AM    5:00PM  8.00

When I write it to a csv using the Python CSV module it comes out as

John,Smith,0.333333333,0.375,0.416666667,0.708333333,0.333333333

Now the interesting part is if I manually save the xls file as a MSDOS csv I get the desired output of

John,Smith,8:00 AM,9:00 AM,10:00 AM,5:00 PM,8:00

Here is the function I am running. Its a bit messy so I apologize in advance.

def csv_gen(filepath, saveto):
    for files in glob.glob("*.xls"):
        shutil.copy(filepath + "\" + files, saveto)
        with xlrd.open_workbook(files) as wb:
            sh = wb.sheet_by_index(0)
            newfile = saveto + files[:-4] + '.csv'
            now = datetime.datetime.now()
            dates = now.strftime("%m-%d-%Y")
            filestart = [saveto + files]
            time = [dates]
            with open(newfile, 'wb') as f:
                c = csv.writer(f,delimiter=',')
                list =  range(sh.nrows)
                last = range(sh.nrows)[-1]
                list.remove(0)
                list.remove(3)
                list.remove(2)
                list.remove(1)
                list.remove(last)
                #Iterate through data and show values of the rows
                for r in list:
                    lines = sh.row_values(r)
                    del lines[:4]
                    stuff = lines + filestart + time
                    #Remove blanks so csv doesnt have uneeded data
                    if lines[0] is '':
                        del stuff[:]
                    #Write to csv file with new data
                    if any(field.strip() for field in stuff):
                        c.writerow(stuff)
            shutil.move(newfile, mergeloc)

I don't understand why this is coming out this way. I have tried adding the dialect flag to the csv writer to be 'excel', but the output is still the same.

Update:

If I save the document as a csv as so workBook.SaveAs(test.csv, 24) The encoding 24 is for MSDOS. I get the desired output of

John,Smith,8:00 AM,9:00 AM,10:00 AM,5:00 PM,8:00

But when the csv module grabs it and removes some blank rows and deletes a few things at the end it writes the rows out and that is when I get the decimals again

John,Smith,0.333333333,0.375,0.416666667,0.708333333,0.333333333

The purpose of the csv module is to modify rows and delete blank rows.

Update

 for r in list: 
     cells = sh.row_values(r) 
     csv_row = cells[0] for col_value in cells[1:]:
         csv_row.append(datetime.time(*xlrd.xldate_as_tuple(col_value, 0)[3:]))

Added the row_values to just return the value of the cell and not xldata:0.33333. Then added a * to make the pass a positional argument.

Matti Lyra · Accepted Answer

That doesn't look like a problem in csv module to me, it looks like something is going wrong in reading the .xls file.

According to the xlrd docs dates in Excel worksheets are a pretty awful mess

Dates in Excel spreadsheets

In reality, there are no such things. What you have are floating point numbers and pious hope. There are several problems with Excel dates:

I did a quick test with a new .xls file with the contents you provided in there. Python has no problems reading the file, although I don't have Excel on my machine, I made the file in LibreOffice and saved it as .xls. Even so, the fields come out as unicode strings on the python side.

You should be able to use the xlrd.xldate_as_tuple(xldate, datemode) (link) to convert the float into a python date tuple. Doing

print xlrd.xldate_as_tuple(0.333333333,0)

prints out

(0, 0, 0, 8, 0, 0)

UPDATE

So you probably want something like the following, changing the for loop that goes over your rows

...
for r_idx in list:
    cells = sh.row(r)
    csv_row = [cells[0]] # the first row value should be ok as just a string
    for col_value in cells[1:]:
        # add the date time column values to the converted csv row
        csv_row.append( datetime.time(xlrd.xldate_as_tuple(col_value, 0)[3:]) )
    ...

csv module writes time as decimal

Answers (1)

Related Questions