Reputation: 11049
I have an array data
created with NumPy in Python with data
1 400
3 300
2 350
etc.
I want to calculate the mean of column 1 but only for those rows whose second column value is greater than 350.
I think it's something like
data[ > 350][:,1].mean()
I know that > 350
is not correct but I don't know how to specify that it should check the second column
Upvotes: 0
Views: 69
Reputation: 18446
You're almost there. You can select all the rows where the second column is greater than 350 by using:
data[:,1] > 350
This will create a numpy array of booleans (print it to see what it looks like. it's just True
and False
values in the shape of data[:,1]
depending on whether they satisfy the condition), which you can use to index data
:
data[ data[:,1] > 350 ][:,1].mean()
Upvotes: 4
Reputation: 394159
Are you looking for this:
In [54]:
a[a[:,1]>350].mean()
Out[54]:
200.5
The boolean condition generates a mask you can use to filter the array:
In [55]:
a[a[:,1]>350]
Out[55]:
array([[ 1, 400]], dtype=int64)
mean
accepts an axis param if you want to calculate mean for the entire array, or row-wise or column-wise.
Upvotes: 1