user11688657
user11688657

Reputation: 13

Keep rows that are duplicates except one column in Python

Here is a sample dataset: I want to make one group with the submissions about the same house (same "sq" and "pool" values) from the same "id" user. And one group from same house but different id user.

I managed to do the first scenario in the third commented row with the comment same id team.

In the forth row I tried to find the submissions about the same house again but from different id's. Unfortunately the code in this line is not correct but I think that the idea of combining the duplicates with the "and" statement and then finding the ones with the different id would give the wanted result. Can you help?

Extra explanation:

If you print the df below then the:

-1st and 3rd row belong to the first scenario(same house, same id user)

-2nd and forth belong to the second scenario(same house, different id user)

Code:

 d = {'id': [1, 2,1,90,3], 'sq': [3,4,3,4,5], "pool": ['yes','no','yes','no','yes']} #first and third / second and forth
    dfq = pd.DataFrame(data=d)
    #dfq[dfq.duplicated(subset=['id'], keep='first')]same id team
    dfq[dfq.duplicated(subset=['sq','pool'], keep='first')] #and !=dfq[id]
    dfq

Upvotes: 1

Views: 1749

Answers (1)

anky
anky

Reputation: 75080

IIUC, you need:

df[df.duplicated(['sq','pool'],keep=False)&~df.id.duplicated(keep=False)]

   id  sq pool
1   2   4   no
3  90   4   no

Upvotes: 1

Related Questions