scott martin
scott martin

Reputation: 1293

Pandas - Finding duplicates across multiple column and find the corresponding output

I have a Dataframe as below

df = pd.DataFrame({'col_1':['cust_a','cust_b','cust_c','cust_d','cust_e','cust_a'],
'col_2':['prod_a','prod_b','prod_b','prod_a','prod_a','prod_b'],
'col_3':['cat_a','cat_b','cat_c','cat_d','cat_a','cat_a']})

The above returns the below Dataframe

col_1   col_2  col_3
cust_a  prod_a  cat_a
cust_b  prod_b  cat_b
cust_c  prod_b  cat_c
cust_d  prod_a  cat_d
cust_e  prod_a  cat_a
cust_a  prod_b  cat_a

I am trying to find the duplicate rows based on values in col_1 and col_3.

For example, cust_a and cat_a appear twice and I am trying to filter them.

Expected output:

 col_1   col_2  col_3
cust_a  prod_a  cat_a
cust_a  prod_b  cat_a

I tried the below but it still has duplicates:

df[df.duplicated(keep=False)]

Upvotes: 1

Views: 84

Answers (1)

jezrael
jezrael

Reputation: 862511

Use parameter subset in DataFrame.duplicated:

df = df[df.duplicated(subset=['col_1','col_3'], keep=False)]
print(df)
    col_1   col_2  col_3
0  cust_a  prod_a  cat_a
5  cust_a  prod_b  cat_a

Upvotes: 1

Related Questions