Group by example from SQL to pandas/python

Question

Let's say I am trying to find how many duplicates I have for a pair of values in a table. The columns are "A" and "B" I can do

select A, B, count(*) as counter from table group by A, B

In fact, I could also do

select A, B from (select A, B, count(*) as counter from table group by A, B) where counter >= 2

to only deal with values that have n duplicates.

How can I do the same in pandas?

I can do

df.groupby(["A", "B"].count(),

but that gives me every element, I only want to limit to those where count>=2

For example if I have:

I want to identify the first two columns because groupby() gives count of 2 (the pair (x,a) is repeated). I would like to do the same for any value, not just 2.

BENY · Accepted Answer

Seems like you can do filter after groupby

df.groupby(["A", "B"])['A'].count().loc[lambda x : x>2]

Update duplicated

df[df.duplicated(['A','B'],keep=False)]
Out[1178]: 
   A  B  C
0  x  a  1
1  x  a  1

transform for different n

n=2

df[df.groupby(['A','B'])['A'].transform('count')==n]

Group by example from SQL to pandas/python

Answers (1)

Related Questions