Reputation: 373
I'd like to figure out how often a negative values occurs and how long that negative price occurs.
example df
d = {'value': [1,2,-3,-4,-5,6,7,8,-9,-10], 'period':[1,2,3,4,5,6,7,8,10]}
df = pd.DataFrame(data=d)
I checked which rows had negative values. df['value'] < 0
I thought I could just iterate through each row, keep a counter for when a negative value occurs and perhaps moving that row to another df, as I would like to save the beginning period
and ending period
.
What I'm currently trying
def count_negatives(df):
df_negatives = pd.DataFrame(columns=['start','end', 'counter'])
for index, row in df.iterrows():
counter = 0
df_negative_index = 0
while(row['value'] < 0):
# if its the first one add it to df as start ?
# grab the last one and add it as end
#constantly overwrite the counter?
counter += 1
#add counter to df row
df_negatives['counter'] = counter
return df_negatives
Except that gives me an infinite loop I think. If I replace while
with an if
I'm stuck comming up with a way to keep track of how long.
Upvotes: 3
Views: 96
Reputation: 863301
I think better is avoid loops:
#compare by <
a = df['value'].lt(0)
#running sum
b = a.cumsum()
#counter only for negative consecutive values
df['counter'] = b-b.mask(a).ffill().fillna(0).astype(int)
print (df)
value period counter
0 1 1 0
1 2 2 0
2 -3 3 1
3 -4 4 2
4 -5 5 3
5 6 6 0
6 7 7 0
7 8 8 0
8 -9 9 1
9 -10 10 2
Or if dont need reset counter :
a = df['value'].lt(0)
#repalce values per mask a to 0
df['counter'] = a.cumsum().where(a, 0)
print (df)
value period counter
0 1 1 0
1 2 2 0
2 -3 3 1
3 -4 4 2
4 -5 5 3
5 6 6 0
6 7 7 0
7 8 8 0
8 -9 9 4
9 -10 10 5
If want start and end period:
#comapre for negative mask
a = df['value'].lt(0)
#inverted mask
b = (~a).cumsum()
#filter only negative rows
c = b[a].reset_index()
#aggregate first and last value per groups
df = (c.groupby('value')['index']
.agg([('start', 'first'),('end', 'last')])
.reset_index(drop=True))
print (df)
start end
0 2 4
1 8 9
Upvotes: 2
Reputation: 164783
I would like to save the beginning period and ending period.
If this is your requirement, you can use itertools.groupby
. Note also a period
series is not required, as Pandas provides a natural integer index (beginning at 0) if not explicitly provided.
from itertools import groupby
from operator import itemgetter
d = {'value': [1,2,-3,-4,-5,6,7,8,-9,-10]}
df = pd.DataFrame(data=d)
ranges = []
for k, g in groupby(enumerate(df['value'][df['value'] < 0].index), lambda x: x[0]-x[1]):
group = list(map(itemgetter(1), g))
ranges.append((group[0], group[-1]))
print(ranges)
[(2, 4), (8, 9)]
Then, to convert to a dataframe:
df = pd.DataFrame(ranges, columns=['start', 'end'])
print(df)
start end
0 2 4
1 8 9
Upvotes: 1