Keep rows that are duplicates except one column in Python

Question

Here is a sample dataset: I want to make one group with the submissions about the same house (same "sq" and "pool" values) from the same "id" user. And one group from same house but different id user.

I managed to do the first scenario in the third commented row with the comment same id team.

In the forth row I tried to find the submissions about the same house again but from different id's. Unfortunately the code in this line is not correct but I think that the idea of combining the duplicates with the "and" statement and then finding the ones with the different id would give the wanted result. Can you help?

Extra explanation:

If you print the df below then the:

-1st and 3rd row belong to the first scenario(same house, same id user)

-2nd and forth belong to the second scenario(same house, different id user)

Code:

 d = {'id': [1, 2,1,90,3], 'sq': [3,4,3,4,5], "pool": ['yes','no','yes','no','yes']} #first and third / second and forth
    dfq = pd.DataFrame(data=d)
    #dfq[dfq.duplicated(subset=['id'], keep='first')]same id team
    dfq[dfq.duplicated(subset=['sq','pool'], keep='first')] #and !=dfq[id]
    dfq

anky · Accepted Answer

IIUC, you need:

df[df.duplicated(['sq','pool'],keep=False)&~df.id.duplicated(keep=False)]

   id  sq pool
1   2   4   no
3  90   4   no

Keep rows that are duplicates except one column in Python

Answers (1)

Related Questions