Reputation: 11
I'm trying to create a frequency-of-occurrence map with an array of time
, lat
, lon
. I should end up with a 2d lat
/lon
array of frequencies. The code below outlines my approach, and I run into problems at step d, when I convert the inverted boolean array mask to numerical values. I accidentally found a way to do, but I don't know why it works (np.mean
). I can't see why np.mean
turned booleans to floats but then didn't actually calculate the mean along the requested axis. I had to apply np.mean
again to get the desired result. I feel there must be a right way to do convert a boolean array to floats or integers. Also, if you can think of a more better way to accomplish the task, fire away. My numpy mojo is weak and this was the only approach I could come up with.
import numpy as np
# test 3D array in time, lat, lon; values are percents
# real array is size=(30,721,1440)
a = np.random.random_integers(0,100, size=(3,4,5))
print(a)
# Exclude all data outside the interval 0 - 20 (first quintile)
# Repeat for 21-40, 41-60, 61-80, 81-100
b = np.ma.masked_outside(a, 0, 20)
print "\n\nMasked array: "
print(b)
# Because mask is false where data within quintile, need to invert
c = [~b.mask]
print "\n\nInverted mask: "
print(c)
# Accidental way to turn True/False to 1./0., but that's what I want
d = np.mean(c, axis = 0)
print "\n\nWhy does this work? How should I be doing it?"
print(d)
# This is the mean I want. Gives desired end result
e = np.mean(d, axis = 0)
print "\n\nFrequency Map"
print(e)
How do I convert the boolean values in my (inverted) array mask to numerical (1 and 0)?
Upvotes: 0
Views: 6043
Reputation: 353179
It "works" because your c
isn't what you think it is:
>>> c
[array([[[False, False, False, False, False],
[False, False, False, False, True],
[False, False, False, False, False],
[False, False, False, False, False]],
[[False, False, False, False, False],
[False, False, False, False, True],
[False, False, False, True, False],
[False, False, False, False, True]],
[[False, False, False, False, False],
[False, False, False, False, False],
[False, True, False, False, False],
[ True, False, True, True, False]]], dtype=bool)]
>>> type(c)
<type 'list'>
It's not an array, it's a list containing an array. So when you take
d = np.mean(c, axis = 0)
you're taking the mean of a list of one element, which is simply itself (but converted to float, because that's what mean
does, and float(True) == 1.0
.)
Instead, lose the unneeded brackets:
>>> c = ~b.mask
>>> output = c.mean(axis=0)
>>> output
array([[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.66666667],
[ 0. , 0.33333333, 0. , 0.33333333, 0. ],
[ 0.33333333, 0. , 0.33333333, 0.33333333, 0.33333333]])
>>> np.allclose(output, e)
True
BTW, the canonical way to convert from bool to float or int is using astype
, e.g. c.astype(float)
or c.astype(int)
but to be honest sometimes I'm lazy and simply write c + 0.0
or c + 0
. You didn't hear that from me, though.
Upvotes: 3