Why latitudes and longitudes are two dimensional arrays in netcdf file?

Question

I have netCDF file, which contains temperature data over some location. Data shape is 1450x900.

I am creating search functionality in my app, to locate temperature data with lat, lon values.

So I extracted lat and lon coordinates data from netCDf file, but I was expecting that they would be 1D arrays and instead got 2D arrays with 1450x900 shape for both coordinates.

So my question: why they are 2d arrays, instead of 1450 latitude values and 900 longitude values? Doesn't 1450 lat values and 900 lon values describe whole grid?

Lets say we have 4x5 square, indices for locating rightmost and bottom-most point of the grid will be [4, 5]. So my indices for x will be[1, 2, 3, 4] and for y: [1, 2, 3, 4, 5]. 9 indices in total are enough to locate any point on that grid (consisting of 20 cells). So why do lat (x) and lon (y) coordinates in netcdf file contain 20 indices separately (40 in total), instead of 4 and 5 indices respectively (9 in total)? Hope you get what confuses me.

Is it possible to somehow map those 2D arrays and "downgrade" to 1450 latitude values and 900 longitude values? OR it is ok as it is right now? How can I use those values for my intention? Do I need to zip lat lon arrays?

here are the shapes:

>>> DS = xarray.open_dataset('file.nc')
>>> DS.tasmin.shape
    (31, 1450, 900)
>>> DS.projection_x_coordinate.shape
    (900,)
>>> DS.projection_y_coordinate.shape
    (1450,)
>>> DS.latitude.shape
    (1450, 900)
>>> DS.longitude.shape
    (1450, 900)

consider that projection_x_coordinate and projection_y_coordinate are easting/northing values not lat/longs

here is the metadata of file if needed:

Dimensions:                       (bnds: 2, projection_x_coordinate: 900, projection_y_coordinate: 1450, time: 31)
Coordinates:
  * time                          (time) datetime64[ns] 2018-12-01T12:00:00 ....
  * projection_y_coordinate       (projection_y_coordinate) float64 -1.995e+0...
  * projection_x_coordinate       (projection_x_coordinate) float64 -1.995e+0...
    latitude                      (projection_y_coordinate, projection_x_coordinate) float64 ...
    longitude                     (projection_y_coordinate, projection_x_coordinate) float64 ...
Dimensions without coordinates: bnds
Data variables:
    tasmin                        (time, projection_y_coordinate, projection_x_coordinate) float64 ...
    transverse_mercator           int32 ...
    time_bnds                     (time, bnds) datetime64[ns] ...
    projection_y_coordinate_bnds  (projection_y_coordinate, bnds) float64 ...
    projection_x_coordinate_bnds  (projection_x_coordinate, bnds) float64 ...
Attributes:
    comment:        Daily resolution gridded climate observations
    creation_date:  2019-08-21T21:26:02
    frequency:      day
    institution:    Met Office
    references:     doi: 10.1002/joc.1161
    short_name:     daily_mintemp
    source:         HadUK-Grid_v1.0.1.0
    title:          Gridded surface climate observations data for the UK
    version:        v20190808
    Conventions:    CF-1.5

alani · Accepted Answer

Your data adheres to version 1.5 of the Climate and Forecast conventions.

The document describing this version of the conventions is here, although the relevant section is essentially unchanged across many versions of the conventions.

See section 5.2:

5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables

The latitude and longitude coordinates of a horizontal grid that was not defined as a Cartesian product of latitude and longitude axes, can sometimes be represented using two-dimensional coordinate variables. These variables are identified as coordinates by use of the coordinates attribute.

It looks like you are using the HadOBS 1km resolution gridded daily minimum temperature, and this file in particular:

http://dap.ceda.ac.uk/thredds/fileServer/badc/ukmo-hadobs/data/insitu/MOHC/HadOBS/HadUK-Grid/v1.0.1.0/1km/tasmin/day/v20190808/tasmin_hadukgrid_uk_1km_day_20181201-20181231.nc (warning: >300MB download)

As it states, the data is on a transverse mercator grid.

If you look at output from ncdump -h you will also see the following description of the grid expressed by means of attributes of the transverse_mercator dummy variable:

        int transverse_mercator ;
                transverse_mercator:grid_mapping_name = "transverse_mercator" ;
                transverse_mercator:longitude_of_prime_meridian = 0. ;
                transverse_mercator:semi_major_axis = 6377563.396 ;
                transverse_mercator:semi_minor_axis = 6356256.909 ;
                transverse_mercator:longitude_of_central_meridian = -2. ;
                transverse_mercator:latitude_of_projection_origin = 49. ;
                transverse_mercator:false_easting = 400000. ;
                transverse_mercator:false_northing = -100000. ;
                transverse_mercator:scale_factor_at_central_meridian = 0.9996012717 ;

and you will also see that the coordinate variables projection_x_coordinate and projection_y_coordinate have units of metres.

The grid in question is the Ordnance Survey UK grid using numeric grid references.
See for example this description of the OS grid (from Wikipedia).

If you wish to express the data on a regular longitude-latitude grid then you will need to do some type of interpolation. I see that you are using xarray. You can combine this with pyresample to do the interpolation. Here is an example:

import xarray as xr
import numpy as np
from pyresample.geometry import SwathDefinition
from pyresample.kd_tree import resample_nearest, resample_gauss

ds = xr.open_dataset("tasmin_hadukgrid_uk_1km_day_20181201-20181231.nc")

# Define a target grid. For sake of example, here is one with just 
# 3 longitudes and 4 latitudes.
lons = np.array([-2.1, -2., -1.9])
lats = np.array([51.7, 51.8, 51.9, 52.0])

# The target grid is regular (1-d lon, lat coordinates) but we will need
# a 2d version (similar to the input grid), so use numpy.meshgrid to produce this.
lon2d, lat2d = np.meshgrid(lons, lats)

origin_grid = SwathDefinition(lons=ds.longitude, lats=ds.latitude)
target_grid = SwathDefinition(lons=lon2d, lats=lat2d)

# get a numpy array for the first timestep
data = ds.tasmin[0].to_masked_array()

# nearest neighbour interpolation example
# Note that radius_of_influence has units metres

interpolated = resample_nearest(origin_grid, data, target_grid, radius_of_influence=1000)

# GIVES:
#      array([[5.12490065, 5.02715332, 5.36414835],
#             [5.08337723, 4.96372838, 5.00862833],
#             [6.47538931, 5.53855722, 5.11511239],
#             [6.46571817, 6.17949381, 5.87357538]])


# gaussian weighted interpolation example
# Note that radius_of_influence and sigmas both have units metres

interpolated = resample_gauss(origin_grid, data, target_grid, radius_of_influence=1000, sigmas=1000)

# GIVES:
#      array([[5.20432465, 5.07436805, 5.39693221],
#             [5.09069187, 4.8565934 , 5.08191639],
#             [6.4505963 , 5.44018209, 5.13774416],
#             [6.47345359, 6.2386732 , 5.62121948]])

Why latitudes and longitudes are two dimensional arrays in netcdf file?

Answers (2)

Related Questions