How do you prevent pandas to_csv from formatting numbers as numeric in order to preserve leading 0s?

Question

I am using the to_csv method to save a dataframe out as a csv. However, the leading 0s are being removed.

For example, this is what the dataframe looks like prior to outputting it:

And as you can see in the 6th column "Purchase Order", the data is numeric but contains leading 0s. I would like to preserve that formatting and keep the leading 0s when outputting to csv and viewing in Excel, however in the csv output the leading 0s are removed:

Is there a way to prevent this? This is the command I am using:

out1.to_csv(outfile + '.csv', index=False, float_format='%f')

Tom · Accepted Answer

I believe this is more of an Excel issue rather than pandas. If you look at your csv file with a text-editor, (I am guessing) you would still see the numbers with the leading zeroes. I say this because I have experienced similar issues with Excel assuming the format of items within CSV files, namely dates, and producing irreversible or annoying conversions.

In this case, it looks like Excel is seeing the Purchase Order as a number, and then formatting them as such. And it does this immediately on opening the file, so you can't go back and format the cell or anything (the zeroes are lost).

I think 2 simple alternatives would be:

save in .xlsx format with df.to_excel(). This worked for me with some fake-leading zeroes data. If you only intend on using the data in Excel, maybe this is best.
Edit your purchase order column to make it clearly text. Like appending a non-numeric character (which could later be removed):

df['Purchase Order'] = '#' + df['Purchase Order']
# df.to_csv('data.csv')

How do you prevent pandas to_csv from formatting numbers as numeric in order to preserve leading 0s?

Answers (2)

Related Questions