Reputation: 572
I am reading a large database into multiple dataframes. Which works every time. So I have individual dataframes. Then, I write each dataframe into a csv file. Initially it has 34 columns. After this, I read the csv file into a new dataframe and now I have 35 columns.
I did this for writing into the csv file:
df.to_csv(path + "file_01.csv")
And this for reading from it:
import pandas as ps
df = ps.read_csv(path + "file_01.csv")
I test their columns number by this:
df.shape
Why is it happening and how can I improve\make it work properly?
Upvotes: 1
Views: 558
Reputation: 2811
As the other answers have already explained, the index is being saved together in the .csv file. If the index value is important and needs to be saved, you can edit only the .read_csv()
function by adding the parameter index_col = 0
df = ps.read_csv(path + "file_01.csv", index_col=0)
Upvotes: 1
Reputation: 948
When you write to csv in pandas, the index column is placed to the left of the data columns in the csv. To remove the index from the csv, you can use the index=False argument.
df.to_csv(path + "file_01.csv", index=False)
Upvotes: 1
Reputation: 786
According to the documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
to_csv will write the index. The index will work as a new column.
to disable that set index=false
Upvotes: 1
Reputation: 578
Default value of index
argument of to_csv
is true which results in export of additional index column.
You can do df.to_csv(path + "file_01.csv", index=False)
to exclude index column from being appended.
Documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Upvotes: 1