Reputation: 4557
having an issue with handling oddly formatted excel data and writing to CSV in a string format. In my sample data, the excel table I am importing has a column ('Item_Number') and the odd data in the cell looks like: ="0001", ="00201", 2002AA, 1003B.
When I try to output to csv, the results look like: 1, 201, 2002AA, 1003B. When I try to output to excel, the results are correct: 0001, 00201, 2002AA, 1003B.
All of the dtypes are objects. Am I missing a parameter in my .to_csv() command?
df = pd.read_excel(filename,sheetname='Sheet1', converters= {'Item_Number':str})
df.to_csv('Test_csv.csv')
df.to_excel('Test_excel.xlsx')
Tried different iterations of replacing the "=" and " " " but no response.
df.Item_Number.str.replace('=','')
Currently using the excel output but curious if there is a way to preserve string formatting in CSV. Thanks :)
Upvotes: 3
Views: 8289
Reputation: 4557
Opening an excel spreadsheet with Python 3 Pandas that has data that looks like ="0001" will go to the dataframe correctly. CSV will turn it back to "1". Keeping the same format to CSV is apparently a known issue (from my comment above). To keep the formatting I have to add =" " back into the data like this:
df['Item_Number'] = '="' + df['Item_Number'] + '"'
Not sure if there is a cleaner version to that will have an Excel opened CSV file show 0001 without the quotes and equals sign.
Upvotes: 3