Reputation: 139
I wrote this code just so show the example that I'm having. I need to save the data I have to a csv then reopen it later but when I reload the data into a pandas dataframe from csv it now has an extra unnamed column at the front that I don't want and it's messing up my data when I try to do .drop_duplicates() because each row now has its own number and every I reopen it from a csv it will have a new row of number at the front, just making everything worse. How do I make it so it doesn't have this?
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randn(100,4), columns=list('ABCD'))
df.to_csv('data.csv')
print(df.head())
df1 = pd.read_csv('data.csv')
print(df1.head())
Upvotes: 1
Views: 1089
Reputation: 139
The solution was super easy. I needed to do
df.to_csv('data.csv', index= False)
Upvotes: 0
Reputation: 77357
Its the dataframe index. You can turn that off with
df.to_csv('data.csv', index=False)
The docs are the first stop to learn the different options you have when writing. pandas.DataFrame.to_csv
Upvotes: 1
Reputation: 15488
While reading, you can prevent columns with empty rows like:
df = pd.read_csv("data.csv").dropna()
Upvotes: 0