Nickel
Nickel

Reputation: 590

How to count consective 1 in column and get max count of each group

I have dataframe with columns 'A' and 'flag'. i want to apply groupby function and get maximum count of consecutive 1 in each group

input data:

df=pd.DataFrame({'A':[1,1,1,1,1,1,2,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,0,1,1,0,1,1,1]}) 

required output

output= pd.DataFrame({'A':[1,1,1,1,1,1,2,2,2,2,2,2,2],'consective_count_max':[3,3,3,3,3,3,3,3,3,3,3,3,3]})  

Upvotes: 2

Views: 83

Answers (3)

G.G
G.G

Reputation: 765

def function1(dd:pd.DataFrame):
    col1=dd.flag.ne(1).cumsum()
    return dd.groupby(col1).flag.transform(sum).max()

df.groupby('A').apply(function1).rename("consective_count_max").pipe(df.join,on='A')

out:

 A  flag  consecutive_count_max
0   1     1                      3
1   1     1                      3
2   1     0                      3
3   1     1                      3
4   1     1                      3
5   1     1                      3
6   2     0                      3
7   2     1                      3
8   2     1                      3
9   2     0                      3
10  2     1                      3
11  2     1                      3
12  2     1                      3

Upvotes: 0

ansev
ansev

Reputation: 30920

IIUC, GroupBy.sum and Series.max with level=0. We can use Series.map to create a series with the original index:

blocks=df['flag'].ne(df['flag'].shift()).cumsum()
df['consecutive_count_max'] = (df['A'].map(df.groupby(['A',blocks])['flag']
                                             .sum()
                                             .max(level=0)))
print(df)

    A  flag  consecutive_count_max
0   1     1                      3
1   1     1                      3
2   1     0                      3
3   1     1                      3
4   1     1                      3
5   1     1                      3
6   2     0                      3
7   2     1                      3
8   2     1                      3
9   2     0                      3
10  2     1                      3
11  2     1                      3
12  2     1                      3

note that when adding, groups with flag == 0 will never get a sum greater than the group with flag == 1 therefore df['flag'].eq(1) is NOT required.

If flag is not 1 or 0 and then it is necessary to check for exampledf['flag'].eq('yes') .we could use:

mapper = pd.crosstab(df['flag'].ne(df['flag'].shift())
                               .cumsum()
                              .loc[df['flag'].eq(1)], df['A']).max()
df['consecituve_count_max'] = df['A'].map(mapper)
#print(df)

Detail

print(df['flag'].ne(df['flag'].shift()).cumsum())

0     1
1     1
2     2
3     3
4     3
5     3
6     4
7     5
8     5
9     6
10    7
11    7
12    7
Name: flag, dtype: int64

Upvotes: 2

jezrael
jezrael

Reputation: 862671

Create Series for consecutive values per groups by Series.shift and Series.cumsum, filter only 1 values by mask m and then count values by SeriesGroupBy.value_counts with max used for mapping by original column A by Series.map:

m = df['flag'].eq(1)
s = df['flag'].ne(df['flag'].shift()).cumsum()[m]
df.A.map(s.groupby(df.A).value_counts().max(level=0))
print (df)
    A  flag  consecutive_count_max
0   1     1                      3
1   1     1                      3
2   1     0                      3
3   1     1                      3
4   1     1                      3
5   1     1                      3
6   2     0                      3
7   2     1                      3
8   2     1                      3
9   2     0                      3
10  2     1                      3
11  2     1                      3
12  2     1                      3

Upvotes: 1

Related Questions