CoMartel
CoMartel

Reputation: 3591

saving as csv corrupts dataframe

I have a pandas dataframe of shape (455698, 62). I want to save it as a csv file, and load it again later with pandas. For now I do this :

df.to_csv("/path/to/file.csv",index=False,sep="\\", encoding='utf-8') #saving
df=pd.read_csv("/path/to/file.csv",delimiter="\\",encoding ='utf-8') #loading

and I get a dataframe with shape (455700, 62) : 2 more lines ? When I check in detail, (looking at all unique values in each columns), I found that some values changed columns in the process.

I've tried multiple separators, forcing dtype ="object", and I can't figure out where the bug is. What should I try?

Upvotes: 1

Views: 2080

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Is it possible that some of your strings contain new-line (\n) character?

In this case i would suggest to use quoting when saving your CSV file:

import csv

df.to_csv("/path/to/file.csv",index=False,sep="\\", encoding='utf-8', quoting=csv.QUOTE_NONNUMERIC)
...

Upvotes: 5

Related Questions