Reputation: 109
I have one dataframe that is a result set of dropping duplicates from another dataframe.
changes = full_set.drop_duplicates(subset=['Employee ID', 'Benefit Plan Type', 'Sum of Premium'], keep='last')
Then I have another where the ID and Plan Type is still listed twice
dupe_accts = changes.set_index(['Employee ID', 'Benefit Plan Type']).index.get_duplicates()
What I'm trying to do now is have a third dataframe that would be if ID and plan type are in
dupe_accts
it would output the rows from
changes
into a new dataframe
So far I have
dupes = changes[['Employee ID', 'Benefit Plan Type']].isin(dupe_accts)
but this is outputting
False False
False False
False False
False False
False False
Upvotes: 1
Views: 78
Reputation: 294536
You don't need to set the index and get dupes that way. You can use duplicated
to get a boolean array and mask the change
dataframe with that.
The keep=False
parameter will identify all duplicates. This is opposed to the other options in which it will not identify the first or last as a duplicate.
duplicated = changes.duplicated(
subset=['Employee ID', 'Benefit Plan Type'], keep=False)
dupe_accts = changes[duplicated]
Upvotes: 3