Filter large 2D numpy array based on array calculations

Question

I have a 2D numpy.ndarray called DataSets that has over 2 million items in it. It looks like this...

[['1.3' '8.7' '2.4' ... 'a' '0' '0']
 ['1.5' '8.1' '2.7' ... 'a' '0' '0']
 ['1.9' '8.2' '2.0' ... 'c' '0' '0']
 ...
 ['1.2' '9.4' '2.5' ... 'b' '0' '0']
 ['0.9' '9.0' '2.3' ... 'a' '0' '0']
 ['1.1' '8.4' '2.8' ... 'd' '0' '0']]

I need to filter it based on the result of the multiplication of the first 3 columns in each row e.g. [0,0] * [0,1] * [0,2]

I'm trying to apply a filter to sort this but the filter isn't working as the reference is expecting an index.

filter_arr = float(DataSets[,0]) * float(DataSets[,1]) * float(DataSets[,2]) <= 25
FilteredDataSet = DataSets[filter_arr]

If I add an index the filter doesn't filter properly and also converts it into a 3D array. How can I rectify the filter to produce a 2D array containing only the rows where the multiplication result of the first 3 rows < 25?

rdesparbes · Accepted Answer

Would that work for you?

DataSets = np.array([
    ['1.3', '8.7', '2.4'], 
    ['1.5', '8.1', '2.7'],
    ['1.9', '8.2', '2.0'],
    ['1.2', '9.4', '2.5'],
    ['0.9', '9.0', '2.3'],
    ['1.1', '8.4', '2.8'],
])
filter_arr = DataSets[:, 0].astype(float) * DataSets[:, 1].astype(float) * DataSets[:, 2].astype(float) <= 25

assert np.all(filter_arr == [False, False, False, False, True, False])

You could then write:

FilteredDataSet = DataSets[filter_arr]

assert np.all(FilteredDataSet == [['0.9', '9.0', '2.3']])

Filter large 2D numpy array based on array calculations

Answers (2)

Related Questions