Reputation: 193

Calculate variables mean in a selective area , in gridded netCDF file

Let say we have TRMM precipitation data, each file represents data for each month. For example, the files in the folder are:

     3B42.1998.01.01.7A.nc,
     3B42.1998.02.01.7A.nc, 
     3B42.1998.03.01.7A.nc, 
     3B42.1998.04.01.7A.nc, 
     3B42.1998.05.01.7A.nc, 
     ......
     ......
     3B42.2010.11.01.7A.nc,         
     3B42.2010.12.01.7A.nc.

These files having a dimension as follows : Xsize=1440, Ysize=400, Zsize=1,Tsize=1. Longitude set to 0 to 360 and Latitude set to -50 to 50. I want to calculate the amount of precipitation over a certain region, let say in between lon=98.5, lon=100 and lat=4, lat=6.5. This means, to read the variables only in this region -:

-------------------- |lon:98.5 lat:6.5| | | |lat:4 lon:100 | ---------------------

I used to do this in GrADS (Grid Analysis and Display System). In GrADS, this can be done like: (simplified version)

      yy=1998
      while yr < 2011
        'sdfopen f:\data\trmm\3B42.'yy'.12.01.7A.nc'
        'd aave(pcp,lon=98.5,lon=100.0,lat=4.0,lat=6.5)'
         res=subwrd(result,4)
         rec=write('d:\precip.sp.TRMM3B42.1.'yy'.csv',res,append)   
         yy = yy+1
      endwhile

I tried to do the same thing in Python,but something went wrong. After a few suggestions,here I am now:

     import csv
     import netCDF4 as nc 
     import numpy as np

     #calculating december only
     f = nc.MFDataset('d:/data/trmm/3B43.????.12.01.7A.nc')#maybe I shouldn't do MFDataset?
     pcpt = f.variables['pcp']
     lon = f.variables['longitude']
     lat = f.variables['latitude']
     # Determine which longitudes
     latidx1 = (lat >=4.0 ) & (lat <=6.5 ) 
     lonidx1 = (lon >=98.5 ) & (lon <=100.0 ) 

     rainf1 = pcpt[:]
     rainf1 = rainf1[:, latidx1][..., lonidx1]
     rainf_1 = rainf1

     with open('d:/trmmtest.csv', 'wb') as fp:
          a = csv.writer(fp)
          for i in rainf_1:
              a.writerow([i])

This script produces a list for (in my case) 15 values in the CSV file. But when I try to get the values for another region, and adjust which I think necessary,let say:

     latidx2 = (lat >=1.0 ) & (lat <=1.5 ) 
     lonidx2 = (lon >=102.75 ) & (lon <=103.25 ) 

     rainf2 = pcpt[:]
     rainf2 = rainf2[:, latidx2][..., lonidx2]
     rainf_2 = rainf2

I get the same values as the first one.

firstarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613 0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]

secondarea=[0.511935,1.0771,0.613548,1.48839,0.445161,1.39161,1.03548,0.452903, 3.07725,2.84613,0.701613,2.10581,2.47839,3.84097,2.41065,1.38387]

I did test on separate scripts, it still give me the same values. I did check in the map (constructed earlier), the values are different on those 2 regions (for December average).

Any idea why? Is there any other elegant way writing this? Thx.

Upvotes: 2

Answers (3)

ClimateUnboxed

Reputation: 8087

I just want to point out that the solution of Fir Nor is incorrect (update: fir nor's post is deleted, it previously suggested a solution based on the use of np.mean), as you can't simply use the arithmetic mean (np.mean) when working on spatial data on a regular lat/lon grid as is the case here since the grid cell size changes as you move towards the poles.

Here is a discussion on the python xarray pages that demonstrates the differences that occur if you do not apply a weighted mean.

I've also made a climate-unboxed youtube video on this topic to explain why the unweighted mean is incorrect and how to use CDO to calculate spatial statistics.

1. CDO solution:

Far better not to worry about this and do the operation with CDO:

cdo fldmean -sellonlatbox,98.5,100,4.5,6 3B42.1998.05.01.7A.nc boxav.nc

2. Python solution

If you want to do this is python, you need to generate weights for your subregion which can be extracted as per your solution (or using xarray.where).

If your latitude is 1D you can turn it into a 2D array using numpy.meshgrid

then on the 2D array generate weights, and calculate the weighted average:

 weights = np.cos(np.deg2rad(lat2d))
 meanrain = numpy.average(pcpt, weights=weights)

Another example of the weights calculation using xarray and a diagnostic of the error is found my my answer here:

Upvotes: 3

Robert Wilson

Reputation: 3407

If you working on Linux, this can be solved using nctoolkit (nctoolkit.readthedocs.io/en/latest/). The following should do everything:

import nctoolkit as nc
ff = '~/data/TRMM3H/3B42.19980101.12.7A.nc'
data = nc.open_data(ff)
data.crop(lon = [98.5, 100], lat = [4, 6.5])
data.spatial_mean()

Note: this uses CDO as a backend, and spatial_mean will calculate the mean weighted by the area of each grid cell.

Upvotes: 2

ShGh

Reputation: 67

I believe it can be done with easymore package easily.

First step is the creation of a shapefile. This can be in any shape (such as point, subbasin or rectangle). In your case it will be a rectangular shapefile with one shape that defines the boundary. This can be done in QGIS, ArcGIS or in python:

Creating a shape file from a bounding box coordinates list

Next is to call the easymore python package and remap the variable to the shapefile of interest as simple as the following:

# loading EASYMORE
from easymore.easymore import easymore

# initializing EASYMORE object
esmr = easymore()

# specifying EASYMORE objects
# name of the case
esmr.case_name                = 'TRMM_3B43'              
# temporary path that the EASYMORE generated GIS files and remapped file will be saved
esmr.temp_dir                 = 'path/temporary/'
# name of target shapefile that the source netcdf files should be remapped to;
# it was created in the first step
esmr.target_shp               = 'path/target_shapefiles/box.shp'
# name of netCDF file(s); multiple files can be specified with *
esmr.source_nc                = ' d:/data/trmm/3B43*.nc'
# name of variables from source netCDF file(s) to be remapped
esmr.var_names                = ['pcp']
# name of variable longitude in source netCDF files
esmr.var_lon                  = 'longitude'
# name of variable latitude in source netCDF files
esmr.var_lat                  = 'latitude'
# name of variable time in source netCDF file; should be always time
esmr.var_time                 = 'time'
# location where the remapped netCDF, csv file will be saved
esmr.output_dir               = 'path/output/'
# if required that the remapped values to be saved as csv as well
esmr.save_csv                 = True

# execute EASYMORE nc remapper
esmr.nc_remapper()

This code will generate both remapped nc files and its csv version in output directory for every original nc file. The remapped files will be the areal average of precipitation over the shape(s) of interests at the original time resolution (for example day). You can easily then upscale them to a monthly time step and compare.

The advantages:

1- With this package you can provide a shapefile that has multiple shape (regions of interest) and it does the remap in one go. As an example, you can give the shapefile of countries of the world instead.

2- If your box is smaller than a grid (polygon) or a point, the returning value will be the grid the small box or point is located at.

3- The remapping and weighting is done in equal area to count for different of equal area grids in WGS84 in higher lats.

4- The code is smart enough so you dont need to worry to about the 0 to 360 lon format and target shapefile in -180 to 180 lon format. For example if the box is in NA, negative lon value, the shapefile can be given in the negative lon format -180 to 180 while the nc file with non negative lon values (0 to 360).

GitHub page with more examples:

https://github.com/ShervanGharari/EASYMORE

Upvotes: 0

Calculate variables mean in a selective area , in gridded netCDF file

Answers (3)

Related Questions