Reputation: 2046
I want to extract rows that meet several conditions from another array.
This is what the original array looks like:
original = array([[Timestamp('2018-01-15 01:59:00'), 329, 30, 5],
[Timestamp('2018-01-15 01:59:00'), 326, 25, 3],
[Timestamp('2018-01-15 02:00:00'), 324, 22, 34],
...,
[Timestamp('2018-01-15 21:57:00'), 322, 23, 3],
[Timestamp('2018-01-15 21:57:00'), 323, 30, 9],
[Timestamp('2018-01-15 21:59:00'), 323, 1, 19]], dtype=object)
The conditions are:
1) Either the 3rd or 4th value is bigger than 25.
2) Either the 3rd or 4th value is twice bigger than the other value.
3) The values are received between 01:00~06:00
So, according to the conditions, the first row will be extracted. (30 is bigger than 25 | 30 is more than twice bigger than 5 | the row was made at 01:59:00, which is between 01:00 ~ 06:00)
Is it possible to do this only with np.where
?
Edit: I could do the job with pandas.
>>> df_text = pd.DataFrame( trade_reset , columns=['date', 'freq', 'in', 'out'])
>>> df_text = df_text[(df_text['in'] >= 30 ) | (df_text['out'] >= 30 )]
>>> df_text = df_text[(df_text['in'] > df_text['out']*2 ) | (df_text['out'] >= df_text['in']*2 )]
>>> df_text[ (df_text['date'] < datetime(2018, 1, 15, 6)) & (df_text['date'] > datetime(2018, 1, 15, 1)) ]
Upvotes: 1
Views: 195
Reputation: 231355
For convenience, define Timestamp
as a np.datetie64
creator:
In [492]: Timestamp=lambda x: np.datetime64(x, 's')
In [493]: Timestamp('2018-01-15 01:59:00')
Out[493]: numpy.datetime64('2018-01-15T01:59:00')
In [494]: original = np.array([[Timestamp('2018-01-15 01:59:00'), 329, 30, 5],
...: [Timestamp('2018-01-15 01:59:00'), 326, 25, 3],
...: [Timestamp('2018-01-15 02:00:00'), 324, 22, 34],
...: [Timestamp('2018-01-15 21:57:00'), 322, 23, 3],
...: [Timestamp('2018-01-15 21:57:00'), 323, 30, 9],
...: [Timestamp('2018-01-15 21:59:00'), 323, 1, 19]], dty
...: pe=object)
...:
In [495]: original
Out[495]:
array([[numpy.datetime64('2018-01-15T01:59:00'), 329, 30, 5],
[numpy.datetime64('2018-01-15T01:59:00'), 326, 25, 3],
[numpy.datetime64('2018-01-15T02:00:00'), 324, 22, 34],
[numpy.datetime64('2018-01-15T21:57:00'), 322, 23, 3],
[numpy.datetime64('2018-01-15T21:57:00'), 323, 30, 9],
[numpy.datetime64('2018-01-15T21:59:00'), 323, 1, 19]],
dtype=object)
Now we can to the time test with:
In [500]: original[:,0]<Timestamp('2018-01-15 06:00:00')
Out[500]: array([ True, True, True, False, False, False])
In [501]: original[:,0]>Timestamp('2018-01-15 01:00:00')
Out[501]: array([ True, True, True, True, True, True])
In [502]: mask = Out[500] & Out[501]
In [503]: mask
Out[503]: array([ True, True, True, False, False, False])
Test on columns 2&3
In [509]: (original[:,[2,3]]>=30).any(axis=1)
Out[509]: array([ True, False, True, False, True, False])
and
In [506]: (original[:,2]>(original[:,3]*2)) | (original[:,3]>=(original[:,2]*2))
...:
Out[506]: array([ True, True, False, True, True, True])
and together
In [510]: mask & Out[509] & Out[506]
Out[510]: array([ True, False, False, False, False, False])
In [511]: np.where(Out[510])
Out[511]: (array([0]),)
Sometimes object
dtype hinders calculations, usually it a function can't delegate the task to methods of the objects. Here the Python integers can be compared, so object arrays can also be compared. In a large array these comparisons might be faster if part of the array was first converted to a 2d numeric array.
In [512]: original[:,1:].astype(int)
Out[512]:
array([[329, 30, 5],
[326, 25, 3],
[324, 22, 34],
[322, 23, 3],
[323, 30, 9],
[323, 1, 19]])
Pandas seems to be 'happier' dealing with object dtypes, but I think that flexibility comes at a speed cost.
Upvotes: 1