Reputation: 849
After doing some research, I found the following (Apply different functions to different items in group object: Python pandas). This is perhaps the exact same thing that I want, but I am unable to make sense of the answers that are being proposed. Let me try and explain with a simple example what I want:
import pandas as pd
import numpy as np
df = pd.DataFrame({'B': ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C': np.random.randn(8),
'D': np.random.randn(8)})
grouped = df.groupby(['B'])
Let us say we have the simple data set built from the above that looks like this:
B C D
0 one -1.758565 -1.544788
1 one -0.309472 2.289912
2 two -1.885911 0.384215
3 three 0.444186 0.551217
4 two -0.502636 2.125921
5 two -2.247551 -0.188705
6 one -0.575756 1.473056
7 three 0.640316 -0.410318
Upon grouping them on column 'B', there were 3 groups created
Now, how can I apply different functions on these groups, but still have them as part of the same data frame. For e.g. if I wanted to check if elements were < 0.5 in group 1, divisible by 2 in group 2 and -ve in group 3. These functions are for illustrative purposes only, the point I want to stress on is that they should be different custom functions that should be applied on each group, but the result should be something we can look at in one data frame. Any advice is appreciated.
Upvotes: 1
Views: 155
Reputation: 16683
You can use np.where
to define whatever logic you want:
df['Flag'] = np.where((df['B'] == 'one') & (df['C'] < 0.5), True, False)
df['Flag'] = np.where((df['B'] == 'two') & (df['C'] >= 0.5), True, df['Flag'])
df['Flag'] = np.where((df['B'] == 'three') & (df['C'] < 0.5), True, df['Flag'])
Out[85]:
B C D Flag
0 one -1.758565 -1.544788 True
1 one -0.309472 2.289912 True
2 two -1.885911 0.384215 False
3 three 0.444186 0.551217 True
4 two -0.502636 2.125921 False
5 two -2.247551 -0.188705 False
6 one -0.575756 1.473056 True
7 three 0.640316 -0.410318 False
From there, let's say you then want to groupby the total that are True
:
df = df.groupby('B')['Flag'].sum().reset_index()
B Flag
0 one 3.0
1 three 1.0
2 two 0.0
To implement as an adjustable custom function (per comment), you can do:
def flag(one, two, three):
df['Flag'] = np.where((df['B'] == 'one') & (one), True, False)
df['Flag'] = np.where((df['B'] == 'two') & (two), True, df['Flag'])
df['Flag'] = np.where((df['B'] == 'three') & (three), True, df['Flag'])
flag(one=df['C'] < 0.5, two=df['C'] >= 0.5, three=df['C'] < 0.5)
df
B C D Flag
0 one -1.758565 -1.544788 True
1 one -0.309472 2.289912 True
2 two -1.885911 0.384215 False
3 three 0.444186 0.551217 True
4 two -0.502636 2.125921 False
5 two -2.247551 -0.188705 False
6 one -0.575756 1.473056 True
7 three 0.640316 -0.410318 False
Upvotes: 3