user366312
user366312

Reputation: 16896

Selecting rows if column values meet certain condition

Given a numpy array, I want to slice all rows where the second column is above/equal a certain threshold. Here is my current attempt:

import numpy as np

#inp = input("Input N : ")
#N = float(inp);

N = 5

#ids = np.arange(1, N+1, 1)
#scores = np.random.uniform(low=2.0, high=6.0, size=(N,))

ids = [ 1.,          2.,          3.,          4.,          5.,        ]
scores = [ 3.75320381,  4.32400937,  2.43537978,  3.73691774,  2.5163266, ]

ids_col = ids.copy()
scores_col = scores.copy()

students_mat = np.column_stack([ids_col, scores_col])

accepted = scores_col[scores_col[:]>=4.0]

accepted_std = students_mat[:, accepted]

print(accepted_std)

Error

>>> (executing file "arrays.py")
Traceback (most recent call last):
  File "D:\I (Blank Space)\Python\arrays.py", line 19, in <module>
    accepted = scores_col[scores_col[:]>=4.0]
TypeError: '>=' not supported between instances of 'list' and 'float'

>>> 

Upvotes: 2

Views: 6782

Answers (1)

Michael Gecht
Michael Gecht

Reputation: 1444

To answer your initial question, you want to define both ids and scores as np.array. This will make your code work until you try to define accepted_std:

import numpy as np
N = 5

ids = np.array([1, 2, 3, 4, 5])
scores = np.array([3.75320381, 4.32400937, 2.43537978, 3.73691774,  2.5163266])

ids_col = ids.copy()
scores_col = scores.copy()

students_mat = np.column_stack([ids_col, scores_col])

accepted = scores_col[scores_col[:]>=4.0]

print(accepted)

I think what you actually want is to get all rows where the score is above a certain threshold. For this, you can change your code to:

import numpy as np
N = 5

ids = np.array([1, 2, 3, 4, 5])
scores = np.array([3.75320381, 4.32400937, 2.43537978, 3.73691774,  2.5163266])

students_mat = np.column_stack([ids, scores])

accepted_std = students_mat[np.where(students_mat[1] >= 4.)]

print(accepted_std)
array([[2. , 4.32400937]])

Upvotes: 1

Related Questions