user3712093
user3712093

Reputation: 11

How to turn a numpy array mask (boolean) into floats or ints

I'm trying to create a frequency-of-occurrence map with an array of time, lat, lon. I should end up with a 2d lat/lon array of frequencies. The code below outlines my approach, and I run into problems at step d, when I convert the inverted boolean array mask to numerical values. I accidentally found a way to do, but I don't know why it works (np.mean). I can't see why np.mean turned booleans to floats but then didn't actually calculate the mean along the requested axis. I had to apply np.mean again to get the desired result. I feel there must be a right way to do convert a boolean array to floats or integers. Also, if you can think of a more better way to accomplish the task, fire away. My numpy mojo is weak and this was the only approach I could come up with.

import numpy as np

# test 3D array in time, lat, lon; values are percents
# real array is size=(30,721,1440)

a = np.random.random_integers(0,100, size=(3,4,5))
print(a)

# Exclude all data outside the interval 0 - 20 (first quintile)
# Repeat for 21-40, 41-60, 61-80, 81-100

b = np.ma.masked_outside(a, 0, 20)
print "\n\nMasked array:  "
print(b)

# Because mask is false where data within quintile, need to invert

c = [~b.mask] 
print "\n\nInverted mask:  "
print(c)

# Accidental way to turn True/False to 1./0., but that's what I want

d = np.mean(c, axis = 0)  
print "\n\nWhy does this work? How should I be doing it?"
print(d)

# This is the mean I want.  Gives desired end result

e = np.mean(d, axis = 0)
print "\n\nFrequency Map"
print(e)

How do I convert the boolean values in my (inverted) array mask to numerical (1 and 0)?

Upvotes: 0

Views: 6043

Answers (1)

DSM
DSM

Reputation: 353179

It "works" because your c isn't what you think it is:

>>> c
[array([[[False, False, False, False, False],
        [False, False, False, False,  True],
        [False, False, False, False, False],
        [False, False, False, False, False]],

       [[False, False, False, False, False],
        [False, False, False, False,  True],
        [False, False, False,  True, False],
        [False, False, False, False,  True]],

       [[False, False, False, False, False],
        [False, False, False, False, False],
        [False,  True, False, False, False],
        [ True, False,  True,  True, False]]], dtype=bool)]
>>> type(c)
<type 'list'>

It's not an array, it's a list containing an array. So when you take

d = np.mean(c, axis = 0)  

you're taking the mean of a list of one element, which is simply itself (but converted to float, because that's what mean does, and float(True) == 1.0.)

Instead, lose the unneeded brackets:

>>> c = ~b.mask
>>> output = c.mean(axis=0)
>>> output
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.66666667],
       [ 0.        ,  0.33333333,  0.        ,  0.33333333,  0.        ],
       [ 0.33333333,  0.        ,  0.33333333,  0.33333333,  0.33333333]])
>>> np.allclose(output, e)
True

BTW, the canonical way to convert from bool to float or int is using astype, e.g. c.astype(float) or c.astype(int) but to be honest sometimes I'm lazy and simply write c + 0.0 or c + 0. You didn't hear that from me, though.

Upvotes: 3

Related Questions