user
user

Reputation: 2103

Group by example from SQL to pandas/python

Let's say I am trying to find how many duplicates I have for a pair of values in a table. The columns are "A" and "B" I can do

select A, B, count(*) as counter from table group by A, B

In fact, I could also do

select A, B from (select A, B, count(*) as counter from table group by A, B) where counter >= 2

to only deal with values that have n duplicates.

How can I do the same in pandas?

I can do

df.groupby(["A", "B"].count(), 

but that gives me every element, I only want to limit to those where count>=2

For example if I have:

   A  B  C
0  x  a  1
1  x  a  1
2  x  b  2
3  y  b  3
4  y  a  1

I want to identify the first two columns because groupby() gives count of 2 (the pair (x,a) is repeated). I would like to do the same for any value, not just 2.

Upvotes: 0

Views: 40

Answers (1)

BENY
BENY

Reputation: 323326

Seems like you can do filter after groupby

df.groupby(["A", "B"])['A'].count().loc[lambda x : x>2]

Update duplicated

df[df.duplicated(['A','B'],keep=False)]
Out[1178]: 
   A  B  C
0  x  a  1
1  x  a  1

transform for different n

n=2

df[df.groupby(['A','B'])['A'].transform('count')==n]

Upvotes: 1

Related Questions