Get column number of elements that are greater than a threshold in 2D numpy array

Question

I have a array like this and would like to get returned the column numbers for each row where the value is over the threshold of 0.6:

X = array([[ 0.16,  0.40,  0.61,  0.48,  0.20],
        [ 0.42,  0.79,  0.64,  0.54,  0.52],
        [ 0.64,  0.64,  0.24,  0.63,  0.43],
        [ 0.33,  0.54,  0.61,  0.43,  0.29],
        [ 0.25,  0.56,  0.42,  0.69,  0.62]])

Result would be:

[[2],
[1, 2],
[0, 1, 3],
[2],
[3, 4]]

Is there a better way of doing this then by a double for-loop?

def get_column_over_threshold(data, threshold):
    column_numbers = [[] for x in xrange(0,len(data))]
    for sample in data:
        for i, value in enumerate(data):
            if value >= threshold:
                column_numbers[i].extend(i)
    return topic_predictions

jmd_dk · Accepted Answer

For each row you can ask for the indices where the elements are greater than 0.6:

result = [where(row > 0.6) for row in X]

This performs the computation you want, but the format of result is somewhat inconvenient, since the result of where in this case is a tuple of size 1, containing a NumPy array with the indices. We can replace where with flatnonzero to get the array directly rather than the tuple. To obtain a list of lists, we explicitly cast this array to a list:

result = [list(flatnonzero(row > 0.6)) for row in X]

(In the code above I assume you have used from numpy import *)

Get column number of elements that are greater than a threshold in 2D numpy array

Answers (2)

Related Questions