Reputation: 3802
I need to add a series with previous rows only if a condition matches in current cell. Here's the dataframe:
import pandas as pd
data = {'col1': [1, 2, 1, 0, 0, 0, 0, 3, 2, 2, 0, 0]}
df = pd.DataFrame(data, columns=['col1'])
df['continuous'] = df.col1
print(df)
I need to +1
a cell with previous sum if it's value > 0
else -1
. So, result I'm expecting is;
col1 continuous
0 1 1//+1 as its non-zero
1 2 2//+1 as its non-zero
2 1 3//+1 as its non-zero
3 0 2//-1 as its zero
4 0 1
5 0 0
6 0 0// not to go less than 0
7 3 1
8 2 2
9 2 3
10 0 2
11 0 1
Case 2 : where I want instead of >0
, I need <-0.1
data = {'col1': [-0.097112634,-0.092674324,-0.089176841,-0.087302284,-0.087351866,-0.089226185,-0.092242213,-0.096446987,-0.101620036,-0.105940337,-0.109484752,-0.113515648,-0.117848816,-0.121133266,-0.123824577,-0.126030136,-0.126630895,-0.126015218,-0.124235003,-0.122715224,-0.121746573,-0.120794916,-0.120291174,-0.120323152,-0.12053229,-0.121491186,-0.122625851,-0.123819704,-0.125751858,-0.127676591,-0.129339428,-0.132342431,-0.137119556,-0.142040092,-0.14837848,-0.15439201,-0.159282645,-0.161271982,-0.162377701,-0.162838307,-0.163204393,-0.164095634,-0.165496071,-0.167224488,-0.167057078,-0.165706164,-0.163301617,-0.161423938,-0.158669389,-0.156508912,-0.15508329,-0.15365104,-0.151958972,-0.150317528,-0.149234892,-0.148259354,-0.14737422,-0.145958527,-0.144633388,-0.143120273,-0.14145652,-0.139930163,-0.138774126,-0.136710524,-0.134692221,-0.132534879,-0.129921444,-0.127974949,-0.128294058,-0.129241763,-0.132263506,-0.137828981,-0.145549768,-0.154244588,-0.163125109,-0.171814857,-0.179911465,-0.186223859,-0.190653162,-0.194761064,-0.197988536,-0.200500606,-0.20260121,-0.204797089,-0.208281065,-0.211846904,-0.215312626,-0.218696339,-0.221489975,-0.221375209,-0.220996031,-0.218558429,-0.215936558,-0.213933531,-0.21242896,-0.209682125,-0.208196607,-0.206243585,-0.202190476,-0.19913106,-0.19703291,-0.194244664,-0.189609518,-0.186600526,-0.18160171,-0.175875689,-0.170767095,-0.167453329,-0.163516985,-0.161168703,-0.158197984,-0.156378046,-0.154794499,-0.153236804,-0.15187487,-0.151623385,-0.150628282,-0.149039072,-0.14826268,-0.147535739,-0.145557646,-0.142223729,-0.139343068,-0.135355686,-0.13047743,-0.125999173,-0.12218752,-0.117021996,-0.111542982,-0.106409901,-0.101904095,-0.097910825,-0.094683375,-0.092079967,-0.088953862,-0.086268097,-0.082907394,-0.080723466,-0.078117426,-0.075431993,-0.072079536,-0.068962411,-0.064831759,-0.061257701,-0.05830671,-0.053889968,-0.048972414,-0.044763431,-0.042162829,-0.039328369,-0.038968862,-0.040450835,-0.041974942,-0.042161609,-0.04280523,-0.042702428,-0.042593856,-0.043166561,-0.043691795,-0.044093492,-0.043965231,-0.04263305,-0.040836102,-0.039605133,-0.037204273,-0.034368645,-0.032293737,-0.029037983,-0.025509509,-0.022704668,-0.021346266,-0.019881524,-0.018675734,-0.017509566,-0.017148129,-0.016671088,-0.016015011,-0.016241862,-0.016416445,-0.016548878,-0.016475455,-0.016405742,-0.015567737,-0.014190101,-0.012373151,-0.010370329,-0.008131459,-0.006729419,-0.005667607,-0.004883919,-0.004841328,-0.005403019,-0.005343759,-0.005377974,-0.00548823,-0.004889709,-0.003884973,-0.003149113,-0.002975268,-0.00283163,-0.00322658,-0.003546589,-0.004233582,-0.004448617,-0.004706967,-0.007400356,-0.010104064,-0.01230257,-0.014430498,-0.016499501,-0.015348355,-0.013974229,-0.012845464,-0.012688459,-0.012552231,-0.013719074,-0.014404172,-0.014611632,-0.013401283,-0.011807386,-0.007417753,-0.003321279,0.000363954,0.004908491,0.010151584,0.013223831,0.016746553,0.02106351,0.024571507,0.027588073,0.031313637,0.034419301,0.037016545,0.038172954,0.038237253,0.038094387,0.037783779,0.036482515,0.036080763,0.035476154,0.034107081,0.03237083,0.030934259,0.029317076,0.028236195,0.027850758,0.024612491,0.01964433,0.015153308,0.009684456,0.003336172]}
df = pd.DataFrame(data, columns=['col1'])
lim = float(-0.1)
s = df['col1'].lt(lim)
out = s.where(s, -1).cumsum()
df['sol'] = out - out.where((out < 0) & (~s)).ffill().fillna(0)
print(df)
Upvotes: 1
Views: 177
Reputation: 942
You can do this using cumsum
function on booleans:
Give me a +1 whenever col1
is not zero:
(df.col1 != 0 ).cumsum()
Give me a -1 whenever col1
is zero:
- (df.col1 == 0 ).cumsum()
Then just add them together!
df['continuous'] = (df.col1 != 0 ).cumsum() - (df.col1 == 0 ).cumsum()
However this does not resolve the dropping below zero criteria you mentioned
Upvotes: 1
Reputation: 150785
The key problem here, to me, is to control the out not to go below zero. With that in mind, we can mask the output where it's negative and adjust accordingly:
# a little longer data for corner case
df = pd.DataFrame({'col1': [1, 2, 1, 0, 0, 0, 0, 3, 2, 2, 0, 0,0,0,0,2,3,4]})
s = df.col1.gt(0)
out = s.where(s,-1).cumsum()
df['continuous'] = out - out.where((out<0)&(~s)).ffill().fillna(0)
Output:
col1 continuous
0 1 1
1 2 2
2 1 3
3 0 2
4 0 1
5 0 0
6 0 0
7 3 1
8 2 2
9 2 3
10 0 2
11 0 1
12 0 0
13 0 0
14 0 0
15 2 1
16 3 2
17 4 3
Upvotes: 1