Reputation: 10193
I read a csv file and I want to remove duplicate entries.
When I run the commands to do that, it creates a new first row that contains column numbers and a new column that contains row numbers.
See
Why does it do that and how should I fix this?
def remove_duplicates(file):
df = pd.read_csv(file, encoding="latin-1", header = None)
Helper.printline(f"Rows in file {file}: {df.shape[0]}")
df.drop_duplicates(keep='first', inplace=True)
Helper.printline(f"Rows in file {file} with duplicates removed: {df.shape[0]}")
df.to_csv(file)
Upvotes: 1
Views: 25
Reputation: 149
Use df.to_csv(file, header=False, index=False)
to save the .csv file without headers and indexes.
Upvotes: 2