Pandas DataFrame wrong indexing after reading from csv

Question

I know very little about python's pandas module. I need to create a DataFrame and store it in .csv file for my project. I am using to_csv and read_csv functions. However, when I compared the two frames (before exporting and the imported one) I got different results. This is the the minimal reproducible example:

import sys
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

documents = []
documents.append("i love python")
documents.append("foo bar")
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
X = X.T.toarray()
df = pd.DataFrame(X, index=vectorizer.get_feature_names())
df.to_csv(path_or_buf = "db.csv")
df1 = pd.read_csv("db.csv")
print(df.axes)
print()
print(df1.axes)

And this is what is printed:

[Index(['bar', 'foo', 'love', 'python'], dtype='object'), RangeIndex(start=0, stop=2, step=1)]

[RangeIndex(start=0, stop=4, step=1), Index(['Unnamed: 0', '0', '1'], dtype='object')]

How can I make the DataFrame imported from a .csv file identical to the original one?

srinivast6 · Accepted Answer

UPDATE:Give index name for the dataframe you are exporting and while reading the exported csv use that name as index. Here I am using vectors as index name

import sys
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd

documents = []
documents.append("i love python")
documents.append("foo bar")
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
X = X.T.toarray()
df = pd.DataFrame(X, index=vectorizer.get_feature_names())
df.index.name = 'vectors'


df.to_csv(path_or_buf="db.csv")
df1 = pd.read_csv("db.csv",index_col='vectors')

print(df)
print()
print(df1)

Old answer: Try exporting csv without index by setting index to false as

df.to_csv(path_or_buf="db.csv", index=False)

Pandas DataFrame wrong indexing after reading from csv

Answers (1)

Related Questions