Check non-duplicates by groups Python

Question

I want to check non-duplicates in groups and return all these non-duplicates but only when these keys shows multiple times. For example,

Var  Code   Description
1    A      ABC
2    B      EF
1    A      ABCD
2    B      EF
1    B      ABC
3    C      DD

I expect to get some result like this

Var  Code Description
1    A    ABC
1    A    ABCD

The reason I want this table is to detect which (var,code) pair have multiple description, because in my case, if it has multiple description, it means the description for these pairs should be modified to be unique.

This table returns rows that have same variable name and same code name but different description. Because for my data, a code of a variable should have only one description instead of multiple descriptions.

3 C DD is not returned is because there is only one record about (3,C) pair and DD is the only description for this pair.

2 B EF is not returned is because though two rows contain (2,B) pair, they have the same description which is normal.

1 B ABC is not returned is because though var 1 shows three times but (1,B) pair shows only one time and has only one description ABC.

Any solutions?

Thank you!

Vaishali · Accepted Answer

You can use groupby and filter the rows using nunique

new_df = df.groupby(['Var', 'Code']).filter(lambda x: x['Description'].nunique() > 1) 

    Var Code    Description
0   1   A       ABC
2   1   A       ABCD

Check non-duplicates by groups Python

Answers (2)

Related Questions