Troy R
Troy R

Reputation: 109

How to get rows from one dataframe if values are in another dataframe

I have one dataframe that is a result set of dropping duplicates from another dataframe.

changes = full_set.drop_duplicates(subset=['Employee ID', 'Benefit Plan Type', 'Sum of Premium'], keep='last')

Then I have another where the ID and Plan Type is still listed twice

dupe_accts = changes.set_index(['Employee ID', 'Benefit Plan Type']).index.get_duplicates()

What I'm trying to do now is have a third dataframe that would be if ID and plan type are in

dupe_accts

it would output the rows from

changes

into a new dataframe

So far I have

dupes = changes[['Employee ID', 'Benefit Plan Type']].isin(dupe_accts)

but this is outputting

False False
False False
False False
False False
False False

Upvotes: 1

Views: 78

Answers (1)

piRSquared
piRSquared

Reputation: 294536

You don't need to set the index and get dupes that way. You can use duplicated to get a boolean array and mask the change dataframe with that.

The keep=False parameter will identify all duplicates. This is opposed to the other options in which it will not identify the first or last as a duplicate.

duplicated = changes.duplicated(
    subset=['Employee ID', 'Benefit Plan Type'], keep=False)
dupe_accts = changes[duplicated]

Upvotes: 3

Related Questions