Reputation: 31
I want to extract a 12º x 12º region from lat/long/conductivity grids and calculate the mean conductivity values in this region. I can successfully apply masks on the lat/long grids, but somehow the same process is not working for the conductivity grid.
I've tried masking with for loops and now I'm using numpy.ma.masked_where function. I can successfully plot masked results (i.e: I can see that the region is extracted when I plot global maps), but the calculated mean conductivity values are corresponding to non-masked data.
I did a simple example of what I want to do:
x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)
xm = np.median(x)
ym = np.median(y)
x = ma.masked_outside(x, xm-3, xm+3)
y = ma.masked_outside(x, ym-3, ym+3)
x = np.ma.filled(x.astype(float), np.nan)
y = np.ma.filled(y.astype(float), np.nan)
x, y = np.meshgrid(x, y)
z = 2*x + 3*y
z = np.ma.masked_where(np.ma.getmask(x), z)
plt.pcolor(x, y, z)
plt.colorbar()
print('Maximum z:', np.nanmax(z))
print('Minimum z:', np.nanmin(z))
print('Mean z:', np.nanmean(z))
My code is:
def Observatory_Cond_Plot(filename, ndcfile, obslon, obslat, obsname, date):
files = np.array(sorted(glob.glob(filename))) #sort txt files containing the 2-D conductivitiy arrays]
filenames = ['January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December'] #used for naming output plots and files
for i, fx in zip(filenames, files):
ndcdata = Dataset(ndcfile) #load netcdf file
lat = ndcdata.variables['latitude'][:] #import latitude data
long = ndcdata.variables['longitude'][:] #import longitude data
cond = np.genfromtxt(fx)
cond, long = shiftgrid(180., cond, long, start=False)
#Mask lat and long arrays and fill masks with nan values
lat = ma.masked_outside(lat, obslat-12, obslat+12)
long = ma.masked_outside(long, obslon-12, obslon+12)
lat = np.ma.filled(lat.astype(float), np.nan)
long = np.ma.filled(long.astype(float), np.nan)
longrid, latgrid = np.meshgrid(long, lat)
cond = np.ma.masked_where(np.ma.getmask(longrid), cond)
cond = np.ma.filled(cond.astype(float), np.nan)
condmean = np.nanmean(cond)
print('Mean Conductivity is:', condmean)
print('Minimum conductivity is:', np.nanmin(cond))
print('Maximum conductivity is:', np.nanmax(cond))
After that, the rest of the code just plots the data
My results are:
Mean Conductivity is: 3.5241649673154587 Minimum conductivity is: 0.497494528344129 Maximum conductivity is: 5.997825822915771
However, from tmy maps, it is clear that the conductivity in this region should not be lower than 3.2 S/m. Also, printing lat, long and cond grids:
long:
[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]
lat:
[[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
...
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]
[nan nan nan ... nan nan nan]]
cond:
[[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
[2.86749432 2.86743283 2.86746221 ... 2.87797247 2.87265508 2.87239185]
...
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]
[ nan nan nan ... nan nan nan]]
And it seems like the mask is not working properly.
Upvotes: 2
Views: 477
Reputation: 8087
Beware that any kind of simple mean calculation (summing and dividing by the total), such as np.mean will not give you the correct answer if you are averaging over the lat-lon grid, since the area changes as you move towards the poles. You need to take a weighted average, weighting by cos(lat).
As you say you have the data in netcdf format, I hope you will permit me to suggest an alternative solution from the command line using the utility climate data operators (cdo) (on ubuntu you can install with sudo apt install cdo).
to extract the region of interest:
cdo sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc
then you can work out the correct weighted mean with
cdo fldmean infile.nc outfile.nc
you can pipe the two together like this:
cdo fldmean -sellonlatbox,lon1,lon2,lat1,lat2 infile.nc outfile.nc
Upvotes: 0
Reputation: 461
The problem is that the call of np.ma.filled
will de-mask the long
variable. Also np.meshgrid
doesn't preserve the masks.
You could save the masks directly after creation and also create the meshgrid from the masks. I adapted your example accordingly. What can be seen is, that all versions of numpy mean
take the mask into account. I had to adapt the upper limit (changed to 2), because the mean has been equal.
x = np.linspace(1, 10, 10)
y = np.linspace(1, 10, 10)
xm = np.median(x)
ym = np.median(y)
# Note: changed limits
x = np.ma.masked_outside(x, xm-3, xm+2)
y = np.ma.masked_outside(x, ym-3, ym+2)
xmask = np.ma.getmask(x)
ymask = np.ma.getmask(y)
x, y = np.meshgrid(x, y)
xmask, ymask = np.meshgrid(xmask, ymask)
z = 2*x + 3*y
z1 = np.ma.masked_where(np.ma.getmask(x), z)
z2 = np.ma.masked_where(xmask | ymask, z)
print(z1)
print(z2)
print('Type z1, z2:', type(z1), type(z2))
print('Maximum z1, z2:', np.nanmax(z1), np.nanmax(z2))
print('Minimum z1, z2:', np.nanmin(z1), np.nanmin(z2))
print('Mean z1, z2:', np.mean(z1), np.mean(z2) )
print('nan Mean z1, z2:', np.nanmean(z1), np.nanmean(z2) )
print('masked Mean z1, z2:', z1.mean(), z2.mean())
Upvotes: 1