data_person
data_person

Reputation: 4490

Pandas groupby and max of string column

Sample DF:

df = pd.DataFrame(np.random.randint(1,10,size=(6,2)),columns = list("AB"))
df["A"] = ["1111","2222","1111","1111","2222","1111"]
df["B"] = ["20010101","20010101","20010101","20010101","20010201","20010201"]
df

OP:

     A          B       
 0  1111    20010101    
 1  2222    20010101    
 2  1111    20010101    
 3  1111    20010101    
 4  2222    20010201    
 5  1111    20010201    

I am trying to find the max transactions done by the user_id in a single day.

For example, for ID: "1111" has done 3 transactions on "200010101" and 1 transaction on "20010201" so the maximum here should be 3, while the ID: 2222 has done 1 transaction on "20010101" and 1 transaction on "20010202" so the op is 1.

Expected OP:

       MAX TRANS IN SINGLE DAY

1111      3
2222      1

Any pandas way to achieve this instead of creating groups and iterating through it.

Upvotes: 0

Views: 5797

Answers (2)

Andy L.
Andy L.

Reputation: 25259

To find max you need groupby, unstack, max on index

In [1832]: df.groupby(['A', 'B'])['A'].count().unstack().max(axis=1)
Out[1832]:
A
1111    3
2222    1
dtype: int64

Upvotes: 2

Erfan
Erfan

Reputation: 42916

We can do groupby twice. First we get the count of each occurence in column B of each ID in column A. Then we groupby again and get the max value:


df2 = pd.DataFrame(df.groupby(['A', 'B'])['B'].count())\
                  .rename({'B':'MAX TRANS SINGLE DAY'}, axis=1)\
                  .reset_index()

df = df2.groupby('A', as_index=False).agg({'MAX TRANS SINGLE DAY':['max', 'min']})

print(df)
      A MAX TRANS SINGLE DAY    
                         max min
0  1111                    3   1
1  2222                    1   1

Upvotes: 1

Related Questions