Reputation: 577
I have the following pandas
time vwap sumQty rvol p_y p_x
2019-01-07 09:00:00 10000.0 500.0 NaN NaN [1.0, 1.0, nan, nan, nan]
2019-01-07 09:05:00 2000.0 400.0 NaN -8000.0 [1.0, 2.0, 10000.0, nan, nan]
2019-01-07 09:10:00 1500.0 800.0 NaN -500.0 [1.0, 3.0, 2000.0, 10000.0, nan]
2019-01-07 09:15:00 2400.0 710.0 NaN 900.0 [1.0, 4.0, 1500.0, 2000.0, 10000.0]
For columns df['p_x'] and df['p_y'], I would like to stack/ combine every row with a rolling 3 windows before it and create two new columns df['r_x'] and df['r_y] for them respectively
Meaning at time = 09:15:00, df['r_y'] = [-8000, -500, 900]
and df['r_x'] = [[1.0, 2.0, 10000.0, nan, nan], [1.0, 3.0, 2000.0, 10000.0, nan], [1.0, 4.0, 1500.0, 2000.0, 10000.0]]
etc.
I have tried: df['r_x'] = df['p_x'].rolling(3).apply(lambda x: list(x))
. However, I got this error:
~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
1701 def apply(self, func, raw=None, args=(), kwargs={}):
1702 return super(Rolling, self).apply(
-> 1703 func, raw=raw, args=args, kwargs=kwargs)
1704
1705 @Substitution(name='rolling')
~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
1010
1011 return self._apply(f, func, args=args, kwargs=kwargs,
-> 1012 center=False, raw=raw)
1013
1014 def sum(self, *args, **kwargs):
~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _apply(self, func, name, window, center, check_minp, **kwargs)
839 results = []
840 for b in blocks:
--> 841 values = self._prep_values(b.values)
842
843 if values.size == 0:
~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _prep_values(self, values, kill_inf)
212 except (ValueError, TypeError):
213 raise TypeError("cannot handle this type -> {0}"
--> 214 "".format(values.dtype))
215
216 if kill_inf:
TypeError: cannot handle this type -> object
Could someone help me out? Thank you!
Upvotes: 1
Views: 554
Reputation: 577
!pip install rolling
import rolling
window_size = 3
df = pd.DataFrame({'a':[[1,2,3],[4,5],[6],[7,8],[9,10,11]]})
b = (window_size-1)*[np.nan] + list(rolling.Apply(df.a, window_size, operation=lambda x: list(np.block(x))))
df['b'] = b
df
Upvotes: 0
Reputation: 862611
Here is necessary numpy:
a = np.array(df['p_x'].values.tolist())
print (a)
[[1.0e+00 1.0e+00 nan nan nan]
[1.0e+00 2.0e+00 1.0e+04 nan nan]
[1.0e+00 3.0e+00 2.0e+03 1.0e+04 nan]
[1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]
#https://stackoverflow.com/a/45748530
L = 3
s0,s1 = a.strides
shp = a.shape
out = a[range(L) + np.arange(shp[0]-L+1)[:,None]]
print (out)
[[[1.0e+00 1.0e+00 nan nan nan]
[1.0e+00 2.0e+00 1.0e+04 nan nan]
[1.0e+00 3.0e+00 2.0e+03 1.0e+04 nan]]
[[1.0e+00 2.0e+00 1.0e+04 nan nan]
[1.0e+00 3.0e+00 2.0e+03 1.0e+04 nan]
[1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]
Or list comprehension:
#https://stackoverflow.com/a/45748613
L = 3
out = np.array([a[i:i + 3] for i in range(0, len(a) - 2)])
print (out)
[[[1.0e+00 1.0e+00 nan nan nan]
[1.0e+00 2.0e+00 1.0e+04 nan nan]
[1.0e+00 3.0e+00 2.0e+03 1.0e+04 nan]]
[[1.0e+00 2.0e+00 1.0e+04 nan nan]
[1.0e+00 3.0e+00 2.0e+03 1.0e+04 nan]
[1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]
Upvotes: 1