Reputation: 337
I am running into an issue where I have some data in a .xls file(example below).
A B C D E F
John Smith 8:00AM 9:00AM 10:00AM 5:00PM 8.00
When I write it to a csv using the Python CSV module it comes out as
John,Smith,0.333333333,0.375,0.416666667,0.708333333,0.333333333
Now the interesting part is if I manually save the xls file as a MSDOS csv I get the desired output of
John,Smith,8:00 AM,9:00 AM,10:00 AM,5:00 PM,8:00
Here is the function I am running. Its a bit messy so I apologize in advance.
def csv_gen(filepath, saveto):
for files in glob.glob("*.xls"):
shutil.copy(filepath + "\\" + files, saveto)
with xlrd.open_workbook(files) as wb:
sh = wb.sheet_by_index(0)
newfile = saveto + files[:-4] + '.csv'
now = datetime.datetime.now()
dates = now.strftime("%m-%d-%Y")
filestart = [saveto + files]
time = [dates]
with open(newfile, 'wb') as f:
c = csv.writer(f,delimiter=',')
list = range(sh.nrows)
last = range(sh.nrows)[-1]
list.remove(0)
list.remove(3)
list.remove(2)
list.remove(1)
list.remove(last)
#Iterate through data and show values of the rows
for r in list:
lines = sh.row_values(r)
del lines[:4]
stuff = lines + filestart + time
#Remove blanks so csv doesnt have uneeded data
if lines[0] is '':
del stuff[:]
#Write to csv file with new data
if any(field.strip() for field in stuff):
c.writerow(stuff)
shutil.move(newfile, mergeloc)
I don't understand why this is coming out this way. I have tried adding the dialect flag to the csv writer to be 'excel', but the output is still the same.
Update:
If I save the document as a csv as so workBook.SaveAs(test.csv, 24)
The encoding 24 is for MSDOS. I get the desired output of
John,Smith,8:00 AM,9:00 AM,10:00 AM,5:00 PM,8:00
But when the csv module grabs it and removes some blank rows and deletes a few things at the end it writes the rows out and that is when I get the decimals again
John,Smith,0.333333333,0.375,0.416666667,0.708333333,0.333333333
The purpose of the csv module is to modify rows and delete blank rows.
Update
for r in list:
cells = sh.row_values(r)
csv_row = cells[0] for col_value in cells[1:]:
csv_row.append(datetime.time(*xlrd.xldate_as_tuple(col_value, 0)[3:]))
Added the row_values to just return the value of the cell and not xldata:0.33333. Then added a * to make the pass a positional argument.
Upvotes: 3
Views: 1412
Reputation: 13088
That doesn't look like a problem in csv
module to me, it looks like something is going wrong in reading the .xls
file.
According to the xlrd
docs dates in Excel worksheets are a pretty awful mess
Dates in Excel spreadsheets
In reality, there are no such things. What you have are floating point numbers and pious hope. There are several problems with Excel dates:
I did a quick test with a new .xls
file with the contents you provided in there. Python has no problems reading the file, although I don't have Excel on my machine, I made the file in LibreOffice and saved it as .xls. Even so, the fields come out as unicode strings on the python side.
You should be able to use the xlrd.xldate_as_tuple(xldate, datemode)
(link) to convert the float into a python date tuple. Doing
print xlrd.xldate_as_tuple(0.333333333,0)
prints out
(0, 0, 0, 8, 0, 0)
UPDATE
So you probably want something like the following, changing the for
loop that goes over your rows
...
for r_idx in list:
cells = sh.row(r)
csv_row = [cells[0]] # the first row value should be ok as just a string
for col_value in cells[1:]:
# add the date time column values to the converted csv row
csv_row.append( datetime.time(xlrd.xldate_as_tuple(col_value, 0)[3:]) )
...
Upvotes: 2