Reputation: 97
I have a data frame like this:
Category Date_1 Score_1 Date_2 Score_2
A 13/11/2019 5 13/11/2019 10
A 13/11/2019 5 14/11/2019 55
A 13/11/2019 5 15/11/2019 45
A 13/11/2019 5 16/11/2019 80
A 14/11/2019 3 13/11/2019 10
A 14/11/2019 3 14/11/2019 55
A 14/11/2019 3 15/11/2019 45
A 14/11/2019 3 16/11/2019 80
A 15/11/2019 7 13/11/2019 10
A 15/11/2019 7 14/11/2019 55
A 15/11/2019 7 15/11/2019 45
A 15/11/2019 7 16/11/2019 80
B 13/11/2019 4 13/11/2019 18
B 13/11/2019 4 14/11/2019 65
B 13/11/2019 4 15/11/2019 75
B 13/11/2019 4 16/11/2019 89
B 14/11/2019 9 13/11/2019 18
B 14/11/2019 9 14/11/2019 65
B 14/11/2019 9 15/11/2019 75
B 14/11/2019 9 16/11/2019 89
B 15/11/2019 8 13/11/2019 18
B 15/11/2019 8 14/11/2019 65
B 15/11/2019 8 15/11/2019 75
B 15/11/2019 8 16/11/2019 89
I want to keep the rows where both dates are same. I was doing this:
df.drop_duplicates(subset=['Date_1', 'Date_2'])
But it do not work. Can`t figure out how to drop those extra rows?
Upvotes: 2
Views: 2399
Reputation: 916
You can use duplicated
with the parameter subset
for specifying columns to be checked with keep=False
, for all duplicates for masking and filtering by boolean indexing. The following should work:
df = df[df.duplicated(subset=['Date_1', 'Date_2'], keep=False)]
Remark: Initially, I may have misread that OP wanted to drop duplicates, with answers below:
df = df.drop_duplicates(subset=['Date_1', 'Date_2'], keep='last')
You should always assign the df
back like above to apply the changes. Otherwise you can add inplace=True
to apply the changes without assignment:
df.drop_duplicates(subset=['Date_1', 'Date_2'], keep='last', inplace=True)
If this doesn't work as well as the solutions provided by others, you may want to check whether there is any difference in data types for those columns which are in subset
, for example datetime formatting.
Upvotes: 2
Reputation: 863701
Use boolean indexing
with compare both columns:
df1 = df[df['Date_1'] == df['Date_2'])
Or DataFrame.query
:
df1 = df.query("Date_1 == Date_2")
Upvotes: 5