Reputation: 61
I have some code that reads through a list of HTML files, takes some information from each of these, and then presents them in a CSV file.
Part of this is using the number from the file's name to display which hour that line of data is from (e.g. file name is Report_H1.html). However, in order to get these to display in the correct order, I have padded out each number with 0's to make them all 2 decimal places.
When writing these to the CSV file, the leading 0's are removed. How can I prevent this from happening?
Edited code down to just the part that is causing my issue:
import re
import csv
import os
BASE_DIRECTORY = "C:\\Users\\MHE36\\Documents"
fileList = []
for (dirpath, dirnames, filenames) in os.walk(BASE_DIRECTORY):
for f in filenames:
if 'html' in str(f):
e = os.path.join(str(dirpath), str(f))
boxName = os.path.dirname(e)
box = os.path.basename(boxName)
fileDict = {}
fileList.append(fileDict)
match = re.search(r'\d+', f)
if match:
fileDict['Hour'] = str(match.group(0).zfill(2))
with open('C:\\Users\\MHE36\\Documents\\test.csv', 'w') as f:
w = csv.DictWriter(f, ['Hour'])
w.writeheader()
for fileDict in fileList:
w.writerow(fileDict)
print(fileList)
print(fileDict)
and this is the output of the print commands:
[{'Hour': '01'}, {'Hour': '10'}, {'Hour': '11'}, {'Hour': '12'}, {'Hour': '13'}, {'Hour': '14'}, {'Hour': '15'}, {'Hour': '16'}, {'Hour': '02'}, {'Hour': '03'}, {'Hour': '04'}, {'Hour': '05'}, {'Hour': '06'}, {'Hour': '07'}, {'Hour': '08'}, {'Hour': '09'}, {'Hour': '01'}, {'Hour': '02'}, {'Hour': '03'}, {'Hour': '04'}, {'Hour': '05'}]
{'Hour': '05'}
In the above print
s, I can see the leading zeros are being added as I want them. However, in the CSV file, I get the following:
Hour
1
I want it as:
Hour
01
Thanks in advance.
Upvotes: 0
Views: 6291
Reputation: 180
I found this in another answer and it worked for me:
df['column']=df['column'].apply('="{}"'.format)
Upvotes: 2
Reputation: 77902
Your issue has nothing to do with Python's csv module (as you would have found out reading the csv file directly in your code editor), but with Excel considering unquoted numerics as, well, numerics.
I don't use nor have Excel (linux user here) so I can't double-check, but passing quoting=csv.QUOTE_ALL
when creating the writer should fix the issue (this will add quote chars around all your numerics so excel should consider them as strings and leave them alone)
The doc: https://docs.python.org/3/library/csv.html#csv.writer
NB : if you only want to have SOME fields quoted, you'll have to do so manually when building the fileDict
.
Upvotes: 3