Reputation: 559
Suppose I have a Pandas DataFrame with 6 columns and a custom function that takes counts of the elements in 2 or 3 columns and produces a boolean output. When a groupby
object is created from the original dataframe and the custom function is applied df.groupby('col1').apply(myfunc)
, the result is a series whose length is equal to the number of categories of col1
. How do I expand this output to match the length of the original dataframe? I tried transform
, but was not able to use the custom function myfunc
with it.
EDIT:
Here is an example code:
A = pd.DataFrame({'X':['a','b','c','a','c'], 'Y':['at','bt','ct','at','ct'], 'Z':['q','q','r','r','s']})
print (A)
def myfunc(df):
return ((df['Z'].nunique()>=2) and (df['Y'].nunique()<2))
A.groupby('X').apply(myfunc)
I would like to expand this output as a new column Result
such that where there is a
in column X, the Result
will be True
.
Upvotes: 2
Views: 2813
Reputation: 823
My solution may not be the best one, which uses a loop, but it's pretty good I think.
The core idea is you can traverse all the sub-dataframe (gdf
) by for i, gdf in gp
. Then add the column result
(in my example it is c
) for each sub-dataframe. Finally concat all the sub-dataframe into one.
Here is an example:
import pandas as pd
df = pd.DataFrame({'a':[1,2,1,2],'b':['a','b','c','d']})
gp = df.groupby('a') # group
s = gp.apply(sum)['a'] # apply a func
adf = []
# then create a new dataframe
for i, gdf in gp:
tdf = gdf.copy()
tdf.loc[:,'c'] = s.loc[i]
adf.append(tdf)
pd.concat(adf)
from:
a b
0 1 a
1 2 b
2 1 c
3 2 d
to:
a b c
0 1 a 2
2 1 c 2
1 2 b 4
3 2 d 4
Upvotes: 0
Reputation: 1041
You can map
the groupby back to the original dataframe
A['Result'] = A['X'].map(A.groupby('X').apply(myfunc))
Result would look like:
X Y Z Result
0 a at q True
1 b bt q False
2 c ct r True
3 a at r True
4 c ct s True
Upvotes: 1