Reputation: 103
I'm trying to make a version of my program faster using as much Pandas and Numpy as possible. I am new to Numpy but have been grasping most of it, but I am having trouble with conditional formatting a column with the max of a range. This is the code I am trying to use to achieve this:
x=3
df1['Max']=numpy.where(df1.index>=x,max(df1.High[-x:],0))
Basically, I am trying to conditionally put the maximum value over the last 3 entries into a cell and repeat down the column. Any and all help is appreciated.
Upvotes: 4
Views: 1721
Reputation: 221574
Use Scipy's maximum_filter
-
from scipy.ndimage.filters import maximum_filter1d
df['max'] = maximum_filter1d(df.High,size=3,origin=1,mode='nearest')
Basically, maximum_filter operates in a sliding window looking for maximum in that window. Now, by default each such max
computation would be performed with window being centered at the index itself. Since, we are looking to go three elements before and ending at the current one, we need to change that centeredness with the parameter origin
. Therefore, we have it set at 1
.
Sample run -
In [21]: df
Out[21]:
High max
0 13 13
1 77 77
2 16 77
3 30 77
4 25 30
5 98 98
6 79 98
7 58 98
8 51 79
9 23 58
Runtime test
Got me interested to see how this Scipy's sliding max operation performs against Pandas's rolling max method on performance. Here's some results on big datasizes -
In [55]: df = pd.DataFrame(np.random.randint(0,99,(10000)),columns=['High'])
In [56]: %%timeit # @Merlin's rolling based solution :
...: df['max'] = df.High.rolling(window=3, min_periods=1).max()
...:
1000 loops, best of 3: 1.35 ms per loop
In [57]: %%timeit # Using Scipy's max filter :
...: df['max1'] = maximum_filter1d(df.High,size=3,\
...: origin=1,mode='nearest')
...:
1000 loops, best of 3: 487 µs per loop
Upvotes: 5
Reputation: 25639
Here is the logic on np.where
numpy.where('test something,if true ,if false)
I think you need below.
dd= {'to': [100, 200, 300, 400, -500, 600, 700,800, 900, 1000]}
df = pd.DataFrame(dd)
df
to
0 100
1 200
2 300
3 400
4 -500
5 600
6 700
7 800
8 900
9 1000
df['Max'] = df.rolling(window=3, min_periods=1).max()
to Max
0 100 100.0
1 200 200.0
2 300 300.0
3 400 400.0
4 -500 400.0
5 600 600.0
6 700 700.0
7 800 800.0
8 900 900.0
9 1000 1000.0
Upvotes: 3