rrigaud
rrigaud

Reputation: 31

Extracting mean values from masked 2-D array

I want to extract a 12º x 12º region from lat/long/conductivity grids and calculate the mean conductivity values in this region. I can successfully apply masks on the lat/long grids, but somehow the same process is not working for the conductivity grid.

I've tried masking with for loops and now I'm using numpy.ma.masked_where function. I can successfully plot masked results (i.e: I can see that the region is extracted when I plot global maps), but the calculated mean conductivity values are corresponding to non-masked data.

I did a simple example of what I want to do:

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

x = ma.masked_outside(x, xm-3, xm+3)
y = ma.masked_outside(x, ym-3, ym+3)
x = np.ma.filled(x.astype(float), np.nan)
y = np.ma.filled(y.astype(float), np.nan)

x, y = np.meshgrid(x, y)

z = 2*x + 3*y

z = np.ma.masked_where(np.ma.getmask(x), z)

plt.pcolor(x, y, z)
plt.colorbar()

print('Maximum z:', np.nanmax(z))
print('Minimum z:', np.nanmin(z))
print('Mean z:', np.nanmean(z))

My code is:

def Observatory_Cond_Plot(filename, ndcfile, obslon, obslat, obsname, date):

files = np.array(sorted(glob.glob(filename))) #sort txt files containing the 2-D conductivitiy arrays]

filenames = ['January', 'February', 'March', 'April', 'May', 'June', 
             'July', 'August', 'September', 'October', 'November', 'December'] #used for naming output plots and files

for i, fx in zip(filenames, files):

    ndcdata = Dataset(ndcfile) #load netcdf file

    lat = ndcdata.variables['latitude'][:] #import latitude data

    long = ndcdata.variables['longitude'][:] #import longitude data

    cond = np.genfromtxt(fx)

    cond, long = shiftgrid(180., cond, long, start=False) 

    #Mask lat and long arrays and fill masks with nan values

    lat = ma.masked_outside(lat, obslat-12, obslat+12)
    long = ma.masked_outside(long, obslon-12, obslon+12)
    lat = np.ma.filled(lat.astype(float), np.nan)
    long = np.ma.filled(long.astype(float), np.nan)

    longrid, latgrid = np.meshgrid(long, lat)

    cond = np.ma.masked_where(np.ma.getmask(longrid), cond)
    cond = np.ma.filled(cond.astype(float), np.nan)

    condmean = np.nanmean(cond)

    print('Mean Conductivity is:', condmean)
    print('Minimum conductivity is:', np.nanmin(cond))
    print('Maximum conductivity is:', np.nanmax(cond))

After that, the rest of the code just plots the data

My results are:

Mean Conductivity is: 3.5241649673154587 Minimum conductivity is: 0.497494528344129 Maximum conductivity is: 5.997825822915771

However, from tmy maps, it is clear that the conductivity in this region should not be lower than 3.2 S/m. Also, printing lat, long and cond grids:

long:

[[nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 ...
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]
 [nan nan nan ... nan nan nan]]

lat:

[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]

cond:

[[       nan        nan        nan ...        nan        nan        nan]
[       nan        nan        nan ...        nan        nan        nan]
 [2.86749432 2.86743283 2.86746221 ... 2.87797247 2.87265508 2.87239185]
 ...
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]
 [       nan        nan        nan ...        nan        nan        nan]]

And it seems like the mask is not working properly.

Upvotes: 2

Views: 477

Answers (2)

ClimateUnboxed
ClimateUnboxed

Reputation: 8087

Beware that any kind of simple mean calculation (summing and dividing by the total), such as np.mean will not give you the correct answer if you are averaging over the lat-lon grid, since the area changes as you move towards the poles. You need to take a weighted average, weighting by cos(lat).

As you say you have the data in netcdf format, I hope you will permit me to suggest an alternative solution from the command line using the utility climate data operators (cdo) (on ubuntu you can install with sudo apt install cdo).

to extract the region of interest:

cdo sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc

then you can work out the correct weighted mean with

cdo fldmean infile.nc outfile.nc

you can pipe the two together like this:

cdo fldmean -sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc

Upvotes: 0

kmuehlbauer
kmuehlbauer

Reputation: 461

The problem is that the call of np.ma.filled will de-mask the long variable. Also np.meshgrid doesn't preserve the masks.

You could save the masks directly after creation and also create the meshgrid from the masks. I adapted your example accordingly. What can be seen is, that all versions of numpy mean take the mask into account. I had to adapt the upper limit (changed to 2), because the mean has been equal.

x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)

xm = np.median(x)
ym = np.median(y)

# Note: changed limits
x = np.ma.masked_outside(x, xm-3, xm+2)
y = np.ma.masked_outside(x, ym-3, ym+2)
xmask = np.ma.getmask(x)
ymask = np.ma.getmask(y)

x, y = np.meshgrid(x, y)
xmask, ymask = np.meshgrid(xmask, ymask)

z = 2*x + 3*y


z1 = np.ma.masked_where(np.ma.getmask(x), z)
z2 = np.ma.masked_where(xmask | ymask, z)
print(z1)
print(z2)

print('Type z1, z2:', type(z1), type(z2))
print('Maximum z1, z2:', np.nanmax(z1), np.nanmax(z2))
print('Minimum z1, z2:', np.nanmin(z1), np.nanmin(z2))
print('Mean z1, z2:', np.mean(z1), np.mean(z2) )
print('nan Mean z1, z2:', np.nanmean(z1), np.nanmean(z2) )
print('masked Mean z1, z2:', z1.mean(), z2.mean())

Upvotes: 1

Related Questions