Reputation: 13
I have a column (price) whose values change over time. From one row to another, the value increases, decreases or stays the same. I want to record how many times the value reached a new high.
So, I added a column currenthigh
which keeps track of the highest value so far. Then I added another column currenthigh_prev
which is the currenthigh
column shifted by one row. This way, I can compare both values: the current and the previous. If currenthigh > currenthigh_prev
then I have a new high, which is recorded in newhighscount
.
I have been trying to use .cummax()
for this, which seemed appropriate.
df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = df['newhighscount'].cummax() + 1
I was expecting this :
datetime last currenthigh currenthigh_shift **newhighscount**
31 2019-04-02 07:57:33 389.8400 389.84 NaN 0
32 2019-04-02 07:57:33 389.8400 389.84 389.84 0
33 2019-04-02 07:57:33 389.8700 389.87 389.84 **1**
34 2019-04-02 07:57:33 389.8800 389.88 389.87 **2**
35 2019-04-02 07:57:33 389.9000 389.90 389.88 **3**
36 2019-04-02 07:57:33 389.9600 389.96 389.90 **4**
37 2019-04-02 07:57:35 389.9000 389.96 389.96 **4**
38 2019-04-02 07:57:36 389.9000 389.96 389.96 **4**
39 2019-04-02 08:00:00 389.3603 389.96 389.96 **4**
40 2019-04-02 08:00:00 388.8500 389.96 389.96 **4**
41 2019-04-02 08:00:00 390.0000 390.00 389.96 **5**
42 2019-04-02 08:00:01 389.7452 390.00 390.00 **5**
43 2019-04-02 08:00:01 389.4223 390.00 390.00 5
44 2019-04-02 08:00:01 389.8000 390.00 390.00 5
And I am getting this:
datetime last currenthigh currenthigh_shift newhighscount
31 2019-04-02 07:57:33 389.8400 389.84 NaN 0
32 2019-04-02 07:57:33 389.8400 389.84 389.84 0
33 2019-04-02 07:57:33 389.8700 389.87 389.84 1
34 2019-04-02 07:57:33 389.8800 389.88 389.87 1
35 2019-04-02 07:57:33 389.9000 389.90 389.88 1
36 2019-04-02 07:57:33 389.9600 389.96 389.90 1
37 2019-04-02 07:57:35 389.9000 389.96 389.96 0
38 2019-04-02 07:57:36 389.9000 389.96 389.96 0
39 2019-04-02 08:00:00 389.3603 389.96 389.96 0
40 2019-04-02 08:00:00 388.8500 389.96 389.96 0
41 2019-04-02 08:00:00 390.0000 390.00 389.96 1
42 2019-04-02 08:00:01 389.7452 390.00 390.00 0
43 2019-04-02 08:00:01 389.4223 390.00 390.00 0
44 2019-04-02 08:00:01 389.8000 390.00 390.00 0
Basically, the df['newhighscount'].cummax()
doesn't seem to return anything.
Upvotes: 1
Views: 180
Reputation: 25239
Edit: base on your data, a single command below would be enough
df['newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
Original:
You logic still works, but it's not elegant as the other answers. It just needs a little bit twist.
In [983]: df
Out[983]:
datetime last currenthigh currenthigh_shift newhighscount
31 2019-04-02 07:57:33 389.8400 389.84 NaN 0
32 2019-04-02 07:57:33 389.8400 389.84 389.84 0
33 2019-04-02 07:57:33 389.8700 389.87 389.84 0
34 2019-04-02 07:57:33 389.8800 389.88 389.87 0
35 2019-04-02 07:57:33 389.9000 389.90 389.88 0
36 2019-04-02 07:57:33 389.9600 389.96 389.90 0
37 2019-04-02 07:57:35 389.9000 389.96 389.96 0
38 2019-04-02 07:57:36 389.9000 389.96 389.96 0
39 2019-04-02 08:00:00 389.3603 389.96 389.96 0
40 2019-04-02 08:00:00 388.8500 389.96 389.96 0
41 2019-04-02 08:00:00 390.0000 390.00 389.96 0
42 2019-04-02 08:00:01 389.7452 390.00 390.00 0
43 2019-04-02 08:00:01 389.4223 390.00 390.00 0
44 2019-04-02 08:00:01 389.8000 390.00 390.00 0
In [985]: df.loc[df['currenthigh'] > df['currenthigh_shift'], 'newhighscount'] = (df['currenthigh'] > df['currenthigh_shift']).astype(int).cumsum()
In [989]: df['newhighscount'] = df['newhighscount'].cummax()
In [990]: df
Out[990]:
datetime last currenthigh currenthigh_shift newhighscount
31 2019-04-02 07:57:33 389.8400 389.84 NaN 0
32 2019-04-02 07:57:33 389.8400 389.84 389.84 0
33 2019-04-02 07:57:33 389.8700 389.87 389.84 1
34 2019-04-02 07:57:33 389.8800 389.88 389.87 2
35 2019-04-02 07:57:33 389.9000 389.90 389.88 3
36 2019-04-02 07:57:33 389.9600 389.96 389.90 4
37 2019-04-02 07:57:35 389.9000 389.96 389.96 4
38 2019-04-02 07:57:36 389.9000 389.96 389.96 4
39 2019-04-02 08:00:00 389.3603 389.96 389.96 4
40 2019-04-02 08:00:00 388.8500 389.96 389.96 4
41 2019-04-02 08:00:00 390.0000 390.00 389.96 5
42 2019-04-02 08:00:01 389.7452 390.00 390.00 5
43 2019-04-02 08:00:01 389.4223 390.00 390.00 5
44 2019-04-02 08:00:01 389.8000 390.00 390.00 5
Upvotes: 0
Reputation: 59549
You want to label unique 'currenthigh'
values. There are many ways to do this:
ngroup
df['NewCount'] = df.groupby('currenthigh', sort=False).ngroup()
rank
:Will work here since cummax
is guaranteed to be monotonically increasing.
df['NewCount'] = (df.currenthigh.rank(method='dense')-1).astype(int)
map
import pandas as pd
arr = pd.Series.unique(df.currenthigh) # Preserves order
df['NewCount'] = df.currenthigh.map(dict((arr[i], i) for i in range(len(arr))))
last currenthigh NewCount
datetime
2019-04-02 07:57:33 389.8400 389.84 0
2019-04-02 07:57:33 389.8400 389.84 0
2019-04-02 07:57:33 389.8700 389.87 1
2019-04-02 07:57:33 389.8800 389.88 2
2019-04-02 07:57:33 389.9000 389.90 3
2019-04-02 07:57:33 389.9600 389.96 4
2019-04-02 07:57:35 389.9000 389.96 4
2019-04-02 07:57:36 389.9000 389.96 4
2019-04-02 08:00:00 389.3603 389.96 4
2019-04-02 08:00:00 388.8500 389.96 4
2019-04-02 08:00:00 390.0000 390.00 5
2019-04-02 08:00:01 389.7452 390.00 5
2019-04-02 08:00:01 389.4223 390.00 5
2019-04-02 08:00:01 389.8000 390.00 5
Upvotes: 0
Reputation: 1312
df['newhighscount'] = df['last'].cummax().diff().gt(0).cumsum()
This calculates the cumulative maximum of the last column, calculates the difference (cummax_t - cummax_{t-1}), checks if the difference is greater than zero and counts the number of times this has been true.
Upvotes: 2