Reputation: 111
I am looking at satellite netcdf format data using xarray, but I first need to convert the dimensions from scanline (the y index corresponding with the satellite scan direction) and ground-pixel (the x index corresponding with the direction adjacent to the scan direction) to latitude and longitude. The latitude and longitude dimensions are currently defined as coordinates in the format: latitude(scanline, ground_pixel). How can I convert these into dimensions of latitude(latitude) and longitude(longitude)? I'd like to be able to plot and query the xarray using lat lon coordinates and xarray query/plotting functions.
Here's a picture of the xarray. I've not yet been able to reproduce a simple example of this data format, with the two dimensions defined for the latitude and longitude coordinates.
The latitude and longitudes in geographical coordinates can be found using: ds.latitude.values and ds.longitude.values, but these are subset into the scanline and ground-pixel arrays. I think I need to collapse these into a single list of latitudes/longitudes.
Upvotes: 2
Views: 1733
Reputation: 111
The data was on an irregular grid and I used the following method to regrid it (see the XESMF docs for further info: https://xesmf.readthedocs.io/en/latest/notebooks/Pure_numpy.html):
%%time
import xesmf as xe
lats = ds_subset.latitude.values
lons = ds_subset.longitude.values
# creating the lat and lon regular grid arrays
# I've just used 100 to test the method quickly, a larger value is required to best represent the resolution of the original data
grid_lats = np.linspace(lats.min(), lats.max(), 100)
grid_lons = np.linspace(lons.min(), lons.max(), 100)
# make the grid that the data will be regridded to
new_grid = xr.Dataset({'lat':(['lat'],grid_lats), 'lon':(['lon'],grid_lons)})
# use periodic=False if either or both the lat and lon dimensions are not regular. I found that the nearest neighbour method works the best. There's data loss with bilinear.
regridder = xe.Regridder(ds_subset,new_grid, 'nearest_s2d', periodic=False,)
# regrid the data
ds_new = regridder(ds_subset)
ds_new
Upvotes: 0
Reputation: 15442
Given your description of the data, it seems like the data is observational data for a set of satellites (or sensor passes/scanlines) which are reported for all pixels where each satellite is in range for a given pass. Maybe each of the 188 scanlines had as many as 109 pixels within range on that day. They're essentially little circles or blobs within the larger grid, with each blob indexed by the scanline ID.
Since this is such a small dataset, the easiest way to convert this to a grid would probably be to drop into pandas to group on pixels and then convert back to xarray. The following will return the mean value observed for each latitude/longitude observation:
gridwise_mean = (
ds_subset.to_dataframe()
.dropna(how="all")
.groupby(["latitude", "longitude"])
.methane_mixing_ratio_bias_corrected
.mean()
.to_xarray()
)
Note that this will return a nLons x nLats array. If you have good coverage of all pixels, and the latitude/longitudes are truly on a regular grid, then this will likely be a pretty reasonable result to work with, and plotting a colormesh with e.g. gridwise_mean.plot()
should return a nice plot of the average observation for each pixel.
Warning: If your latitudes/longitudes are not on a regular grid, this could explode your memory. At worst, if each data point has a unique lat/lon value attached, the result would be (188 * 109) ^ 2 = 420 million points, or about 3.1 GB, with only one non-NaN data point per lat/lon pair. This gets larger fast if you use this method on a larger number of points.
To diagnose whether you have such an issue, you could first compute the number of unique latitudes and longitudes in the dataset with e.g.
np.unique(ds_subset.latitude)
and make sure the product of the number of unique lats and lons is a reasonably small number, and is much smaller than the total number of points in the original dataset.
Other summary stats such as the count, std. dev., min, and max might also be useful to know, so you could compute multiple summary statistics with:
gridwise_summary = (
ds_subset.to_dataframe()
.groupby(["latitude", "longitude"])
.methane_mixing_ratio_bias_corrected
.agg(["mean", "count", "std", "max", "min"])
.to_xarray()
)
This will return an xr.Dataset
where the variables are the above reductions, and can be accessed with e.g. gridwise_summary["max"]
.
Upvotes: 1