Reputation: 49
I am trying to find whether 3 or more occurences of any consecutive number in a column are present, and if so mark the last one with a 1 and the rest with zero's.
df['a'] = df.assign(consecutive=df.Count.groupby((df.Count != df.Count.shift()).cumsum()).transform('size')).query('consecutive > @threshold')
is what i have found here: Identifying consecutive occurrences of a value however this gives me the error: ValueError: Wrong number of items passed 6, placement implies 1.
I understand the issue that it cannot be printed into the dataframe but what would be the correct approach to get this desired result?
Secondly if this condition is satisfied, I would like to execute an equation (e.g. 2*b) to multiple rows neighbouring (either previous or results to follow) the 1 (like the shift function but then repetitive to e.g. 3 previous rows). I'm quite sure this must be possible but have not been able to get this whole objective to work. It does not necessarily have to be based on the one in column c, this is just a proposal.
small data excerpt below for interpretation, column c and d present desired result:
a b c d
16215 2 0 0
24848 4 0 0
24849 4 0 8
24850 4 0 8
24851 4 1 8
24852 6 0 0
24853 6 0 0
24854 8 0 0
24855 8 0 0
24856 8 0 16
25208 8 0 16
25932 8 1 16
28448 10 0 0
28449 10 0 0
28450 10 0 0
Upvotes: 2
Views: 383
Reputation: 323306
Using cumsum
with diff
create the groupkey, then find the last position of each group when it total count
is more than 3 , then we using bfill
with limit
s=df.b.diff().ne(0).cumsum()
s1=s.groupby(s).transform('count')
s2=s.groupby(s).cumcount()
df['c']=((s1==s2+1)&(s1>3)).astype(int)
df['d']=(df.c.mask(df.c==0)*df.b*2).bfill(limit=2).combine_first(df.c)
df
Out[87]:
a b c d
0 16215 2 0 0.0
1 24848 4 0 0.0
2 24849 4 0 8.0
3 24850 4 0 8.0
4 24851 4 1 8.0
5 24852 6 0 0.0
6 24853 6 0 0.0
7 24854 8 0 0.0
8 24855 8 0 0.0
9 24856 8 0 16.0
10 25208 8 0 16.0
11 25932 8 1 16.0
12 28448 10 0 0.0
13 28449 10 0 0.0
14 28450 10 0 0.0
Upvotes: 1