atjw94
atjw94

Reputation: 577

TypeError: cannot handle this type -> object when using pandas rolling().apply(lambda: )

I have the following pandas

time                     vwap   sumQty  rvol    p_y        p_x
2019-01-07 09:00:00     10000.0 500.0   NaN   NaN   [1.0, 1.0, nan, nan, nan]
2019-01-07 09:05:00     2000.0  400.0   NaN -8000.0 [1.0, 2.0, 10000.0, nan, nan]
2019-01-07 09:10:00     1500.0  800.0   NaN -500.0  [1.0, 3.0, 2000.0, 10000.0, nan]
2019-01-07 09:15:00     2400.0  710.0   NaN 900.0   [1.0, 4.0, 1500.0, 2000.0, 10000.0]

For columns df['p_x'] and df['p_y'], I would like to stack/ combine every row with a rolling 3 windows before it and create two new columns df['r_x'] and df['r_y] for them respectively

Meaning at time = 09:15:00, df['r_y'] = [-8000, -500, 900] and df['r_x'] = [[1.0, 2.0, 10000.0, nan, nan], [1.0, 3.0, 2000.0, 10000.0, nan], [1.0, 4.0, 1500.0, 2000.0, 10000.0]] etc.

I have tried: df['r_x'] = df['p_x'].rolling(3).apply(lambda x: list(x)). However, I got this error:

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
   1701     def apply(self, func, raw=None, args=(), kwargs={}):
   1702         return super(Rolling, self).apply(
-> 1703             func, raw=raw, args=args, kwargs=kwargs)
   1704 
   1705     @Substitution(name='rolling')

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in apply(self, func, raw, args, kwargs)
   1010 
   1011         return self._apply(f, func, args=args, kwargs=kwargs,
-> 1012                            center=False, raw=raw)
   1013 
   1014     def sum(self, *args, **kwargs):

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _apply(self, func, name, window, center, check_minp, **kwargs)
    839         results = []
    840         for b in blocks:
--> 841             values = self._prep_values(b.values)
    842 
    843             if values.size == 0:

~/dev/jup/venv/lib/python3.7/site-packages/pandas/core/window.py in _prep_values(self, values, kill_inf)
    212             except (ValueError, TypeError):
    213                 raise TypeError("cannot handle this type -> {0}"
--> 214                                 "".format(values.dtype))
    215 
    216         if kill_inf:

TypeError: cannot handle this type -> object

Could someone help me out? Thank you!

Upvotes: 1

Views: 554

Answers (2)

atjw94
atjw94

Reputation: 577

!pip install rolling
import rolling
window_size = 3
df = pd.DataFrame({'a':[[1,2,3],[4,5],[6],[7,8],[9,10,11]]})
b = (window_size-1)*[np.nan] + list(rolling.Apply(df.a, window_size, operation=lambda x: list(np.block(x))))
df['b'] = b
df

Upvotes: 0

jezrael
jezrael

Reputation: 862611

Here is necessary numpy:

a = np.array(df['p_x'].values.tolist())
print (a)
[[1.0e+00 1.0e+00     nan     nan     nan]
 [1.0e+00 2.0e+00 1.0e+04     nan     nan]
 [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
 [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]

#https://stackoverflow.com/a/45748530
L = 3 
s0,s1 = a.strides
shp = a.shape
out = a[range(L) + np.arange(shp[0]-L+1)[:,None]]
print (out)
[[[1.0e+00 1.0e+00     nan     nan     nan]
  [1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]]

 [[1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
  [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]

Or list comprehension:

#https://stackoverflow.com/a/45748613
L = 3 
out = np.array([a[i:i + 3] for i in range(0, len(a) - 2)])
print (out)
[[[1.0e+00 1.0e+00     nan     nan     nan]
  [1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]]

 [[1.0e+00 2.0e+00 1.0e+04     nan     nan]
  [1.0e+00 3.0e+00 2.0e+03 1.0e+04     nan]
  [1.0e+00 4.0e+00 1.5e+03 2.0e+03 1.0e+04]]]

Upvotes: 1

Related Questions