Reputation: 507
sorry for the dumb question I am new to python and pandas.
Imagine I've got a csv file with some data for every row, for example :
data1, data2, data3, data4
There are no headings, just data, and I need to remove some rows inside such file if
(row1.data3 and row1.data4) == (row2.data3 and row2.data4)
the entire row gets removed.
How can I achieve that?
I did try to use remove_duplicates but without headings I don't know how to do it.
cheers
Upvotes: 2
Views: 1685
Reputation: 25209
Let's say you happen to have a df
without header:
df = pd.read_csv("./try.csv", header=None)
df
# The first row is integers inserted instead of missing column names
0 1 2
0 1 1 1
1 1 1 1
2 2 1 3
3 2 1 3
4 3 2 3
5 3 3 3
Then, you can drop_duplicates
on subsets of columns:
df.drop_duplicates([0])
0 1 2
0 1 1 1
2 2 1 3
4 3 2 3
or
df.drop_duplicates([0,1])
0 1 2
0 1 1 1
2 2 1 3
4 3 2 3
5 3 3 3
Do not forget to assign the result to a new variable or add inplace=True
Upvotes: 3