Reputation: 4311
I have a text file like this:
id,name,sex,
1,Sam,M,
2,Ann,F,
3,Peter,
4,Ben,M,
Then, I read the file: df = pd.read_csv('data.csv')
After that, I write it to another file: df.to_csv('new_data.csv', index = False)
Then, I get
id,name,sex,Unnamed: 3
1,Sam,M,
2,Ann,F,
3,Peter,,
4,Ben,M,
You see that there are two commas instead of one in the fourth line.
How to preserve the format when using pd.to_csv
?
Upvotes: 0
Views: 2088
Reputation: 4792
The problem in your code is that you have a comma after the sex
column in your file. So read_csv thinks it's a new column, which has no name and data.
df= pd.read_csv('data.csv')
df
id name sex Unnamed: 3
0 1 Sam M NaN
1 2 Ann F NaN
2 3 Peter NaN NaN
3 4 Ben M NaN
Hence you have an extra Unnamed column. So when you write the to_csv, it adds two empty values in the 3rd row and hence why, two ,
.
Try:
df = pd.read_csv('data.csv', use_cols = ['id', 'name', 'sex'])
df.to_csv('new_data.csv', index = False)
Upvotes: 1
Reputation: 26005
pandas
is preserving the format - the 3d row has no sex, and as such the csv should have an empty column - that is why you get to commas, since you are separating an empty column.
Your original text file was not a valid csv file.
What you want to do is something else, which is not write a valid csv file - you will have to do this yourself, I do not know of any existing method to create your format.
Upvotes: 1