Jamgreen
Jamgreen

Reputation: 11049

Conditions in NumPy in Python

I have an array data created with NumPy in Python with data

1 400
3 300
2 350
etc.

I want to calculate the mean of column 1 but only for those rows whose second column value is greater than 350.

I think it's something like

data[ > 350][:,1].mean()

I know that > 350 is not correct but I don't know how to specify that it should check the second column

Upvotes: 0

Views: 69

Answers (2)

Carsten
Carsten

Reputation: 18446

You're almost there. You can select all the rows where the second column is greater than 350 by using:

data[:,1] > 350

This will create a numpy array of booleans (print it to see what it looks like. it's just True and False values in the shape of data[:,1] depending on whether they satisfy the condition), which you can use to index data:

data[ data[:,1] > 350 ][:,1].mean()

Upvotes: 4

EdChum
EdChum

Reputation: 394159

Are you looking for this:

In [54]:

a[a[:,1]>350].mean()
Out[54]:
200.5

The boolean condition generates a mask you can use to filter the array:

In [55]:

a[a[:,1]>350]
Out[55]:
array([[  1, 400]], dtype=int64)

mean accepts an axis param if you want to calculate mean for the entire array, or row-wise or column-wise.

Upvotes: 1

Related Questions