user1583007
user1583007

Reputation: 507

Python Pandas: remove duplicate in csv file with no headings

sorry for the dumb question I am new to python and pandas.

Imagine I've got a csv file with some data for every row, for example :

data1, data2, data3, data4

There are no headings, just data, and I need to remove some rows inside such file if

(row1.data3 and row1.data4) == (row2.data3 and row2.data4) 

the entire row gets removed.

How can I achieve that?

I did try to use remove_duplicates but without headings I don't know how to do it.

cheers

Upvotes: 2

Views: 1685

Answers (1)

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25209

Let's say you happen to have a df without header:

df = pd.read_csv("./try.csv", header=None)
df
# The first row is integers inserted instead of missing column names 
    0   1   2
0   1   1   1
1   1   1   1
2   2   1   3
3   2   1   3
4   3   2   3
5   3   3   3

Then, you can drop_duplicates on subsets of columns:

df.drop_duplicates([0])
    0   1   2
0   1   1   1
2   2   1   3
4   3   2   3

or

df.drop_duplicates([0,1])

    0   1   2
0   1   1   1
2   2   1   3
4   3   2   3
5   3   3   3

Do not forget to assign the result to a new variable or add inplace=True

Upvotes: 3

Related Questions