Pandas change each group into a single row

Question

I have a dataframe like the follows.

>>> data
   target  user  data
0       A     1     0
1       A     1     0
2       A     1     1
3       A     2     0
4       A     2     1
5       B     1     1
6       B     1     1
7       B     1     0
8       B     2     0
9       B     2     0
10      B     2     1

You can see that each user may contribute multiple claims about a target. I want to only store each user's most frequent data for each target. For example, for the dataframe shown above, I want the result like follows.

>>> result
  target  user  data
0      A     1     0
1      A     2     0
2      B     1     1
3      B     2     0

How to do this? And, can I do this using groupby? (my real dataframe is not sorted)

Thanks!

BENY · Accepted Answer

Using groupby with count create the helper key , then we using idxmax

df['helperkey']=df.groupby(['target','user','data']).data.transform('count')
df.groupby(['target','user']).helperkey.idxmax()
Out[10]: 
target  user
A       1       0
        2       3
B       1       5
        2       8
Name: helperkey, dtype: int64
df.loc[df.groupby(['target','user']).helperkey.idxmax()]
Out[11]: 
  target  user  data  helperkey
0      A     1     0          2
3      A     2     0          1
5      B     1     1          2
8      B     2     0          2

Pandas change each group into a single row

Answers (1)

Related Questions