UGuntupalli
UGuntupalli

Reputation: 849

Implement different functions for different groupby objects pandas

After doing some research, I found the following (Apply different functions to different items in group object: Python pandas). This is perhaps the exact same thing that I want, but I am unable to make sense of the answers that are being proposed. Let me try and explain with a simple example what I want:

import pandas as pd
import numpy as np

df = pd.DataFrame({'B': ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C': np.random.randn(8),
                   'D': np.random.randn(8)})
grouped = df.groupby(['B'])

Let us say we have the simple data set built from the above that looks like this:

       B         C         D
0    one -1.758565 -1.544788
1    one -0.309472  2.289912
2    two -1.885911  0.384215
3  three  0.444186  0.551217
4    two -0.502636  2.125921
5    two -2.247551 -0.188705
6    one -0.575756  1.473056
7  three  0.640316 -0.410318

Upon grouping them on column 'B', there were 3 groups created

  1. one
  2. two
  3. three

Now, how can I apply different functions on these groups, but still have them as part of the same data frame. For e.g. if I wanted to check if elements were < 0.5 in group 1, divisible by 2 in group 2 and -ve in group 3. These functions are for illustrative purposes only, the point I want to stress on is that they should be different custom functions that should be applied on each group, but the result should be something we can look at in one data frame. Any advice is appreciated.

Upvotes: 1

Views: 155

Answers (1)

David Erickson
David Erickson

Reputation: 16683

You can use np.where to define whatever logic you want:

df['Flag'] = np.where((df['B'] == 'one') & (df['C'] < 0.5), True, False)
df['Flag'] = np.where((df['B'] == 'two') & (df['C'] >= 0.5), True, df['Flag'])
df['Flag'] = np.where((df['B'] == 'three') & (df['C'] < 0.5), True, df['Flag'])

Out[85]: 
       B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False

From there, let's say you then want to groupby the total that are True:

df = df.groupby('B')['Flag'].sum().reset_index()

       B    Flag
0    one     3.0
1  three     1.0
2    two     0.0

To implement as an adjustable custom function (per comment), you can do:

def flag(one, two, three):
    df['Flag'] = np.where((df['B'] == 'one') & (one), True, False)
    df['Flag'] = np.where((df['B'] == 'two') & (two), True, df['Flag'])
    df['Flag'] = np.where((df['B'] == 'three') & (three), True, df['Flag'])


flag(one=df['C'] < 0.5, two=df['C'] >= 0.5, three=df['C'] < 0.5)
df

B         C         D   Flag
0    one -1.758565 -1.544788   True
1    one -0.309472  2.289912   True
2    two -1.885911  0.384215  False
3  three  0.444186  0.551217   True
4    two -0.502636  2.125921  False
5    two -2.247551 -0.188705  False
6    one -0.575756  1.473056   True
7  three  0.640316 -0.410318  False

Upvotes: 3

Related Questions