Reputation: 590
I have dataframe with columns 'A' and 'flag'. i want to apply groupby function and get maximum count of consecutive 1 in each group
df=pd.DataFrame({'A':[1,1,1,1,1,1,2,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,0,1,1,0,1,1,1]})
output= pd.DataFrame({'A':[1,1,1,1,1,1,2,2,2,2,2,2,2],'consective_count_max':[3,3,3,3,3,3,3,3,3,3,3,3,3]})
Upvotes: 2
Views: 83
Reputation: 765
def function1(dd:pd.DataFrame):
col1=dd.flag.ne(1).cumsum()
return dd.groupby(col1).flag.transform(sum).max()
df.groupby('A').apply(function1).rename("consective_count_max").pipe(df.join,on='A')
out:
A flag consecutive_count_max
0 1 1 3
1 1 1 3
2 1 0 3
3 1 1 3
4 1 1 3
5 1 1 3
6 2 0 3
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
Upvotes: 0
Reputation: 30920
IIUC, GroupBy.sum
and Series.max
with level=0
.
We can use Series.map
to create a series with the original index:
blocks=df['flag'].ne(df['flag'].shift()).cumsum()
df['consecutive_count_max'] = (df['A'].map(df.groupby(['A',blocks])['flag']
.sum()
.max(level=0)))
print(df)
A flag consecutive_count_max
0 1 1 3
1 1 1 3
2 1 0 3
3 1 1 3
4 1 1 3
5 1 1 3
6 2 0 3
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
note that when adding, groups with flag == 0
will never get a sum greater than the group with flag == 1
therefore df['flag'].eq(1)
is NOT required.
If flag
is not 1 or 0 and then it is necessary to check for exampledf['flag'].eq('yes')
.we could use:
mapper = pd.crosstab(df['flag'].ne(df['flag'].shift())
.cumsum()
.loc[df['flag'].eq(1)], df['A']).max()
df['consecituve_count_max'] = df['A'].map(mapper)
#print(df)
Detail
print(df['flag'].ne(df['flag'].shift()).cumsum())
0 1
1 1
2 2
3 3
4 3
5 3
6 4
7 5
8 5
9 6
10 7
11 7
12 7
Name: flag, dtype: int64
Upvotes: 2
Reputation: 862671
Create Series
for consecutive values per groups by Series.shift
and Series.cumsum
, filter only 1
values by mask m
and then count values by SeriesGroupBy.value_counts
with max
used for mapping by original column A
by Series.map
:
m = df['flag'].eq(1)
s = df['flag'].ne(df['flag'].shift()).cumsum()[m]
df.A.map(s.groupby(df.A).value_counts().max(level=0))
print (df)
A flag consecutive_count_max
0 1 1 3
1 1 1 3
2 1 0 3
3 1 1 3
4 1 1 3
5 1 1 3
6 2 0 3
7 2 1 3
8 2 1 3
9 2 0 3
10 2 1 3
11 2 1 3
12 2 1 3
Upvotes: 1