bart cubrich
bart cubrich

Reputation: 1254

How to import netCDF4 file with xarray when index names have multiple dimensions?

When I try to import netCDF4 files using xarray I get the following error:

MissingDimensionsError: 'name' has more than 1-dimension and the same name as one of its dimensions ('time', 'name'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

However, I can successfully import these data using the netCDF4 python library, and get the data I need from it. The problem is that this method is very slow, so I was looking for something faster and wanted to try xarray. Here is an example file, and the code that is giving me the bug in question.

from netCDF4 import Dataset
#import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np         
#import seaborn as sns
from tkinter import Tk

from tkinter.filedialog import askdirectory
import os
import xarray as xr

#use this function to get a directory name where the files are
def get_dat():
    root = Tk()
    root.withdraw()
    root.focus_force()
    root.attributes("-topmost", True)      #makes the dialog appear on top
    filename = askdirectory()      # Open single file
    root.destroy()
    root.quit()
    return filename

directory=get_dat()

#loop through files in directory and read the netCDF4 files
for filename in os.listdir(directory):     #loop through files in user's dir
    if filename.endswith(".nc"):     #all my files are .nc not .nc4
        runstart=pd.datetime.now()
        #I get the error right here
        rootgrp3 = xr.open_dataset(directory+'/'+filename)
        #more stuff happens here with the data, but this stuff works

Upvotes: 2

Views: 1914

Answers (3)

rusty
rusty

Reputation: 41

When the variables in question are not too large, a small python-only solution is possible:

import xarray as xr
import netCDF4
def xr_open_with_rename(fn,renames):
    ds=xr.open_dataset(fn,drop_variables=renames.keys())    
    ds_nc = netCDF4.Dataset(fn)
    for v in renames:
        nc_var=ds_nc[v]
        ds[renames[v]]=nc_var.dimensions,nc_var[...]
    ds_nc.close()
    ds=ds.set_coords(renames.values())    
    return ds

# e.g. open GOTM output
ds=xr_open_with_rename(nc_fn,dict(z="z_coord",zi="zi_coord"))

This has been useful for me when there are issues with installing nco.

Upvotes: 1

Tom F
Tom F

Reputation: 339

In ipython, this is a very easy workaround:

!ncrename -v name,name_matrix filename.nc #renaming variable name to prevent dimension/variable name conflict in xarray, requires nco in linux

Upvotes: 0

acapet
acapet

Reputation: 120

The issue is still currently valid. The problem arise when a coordinate has multiple dimensions and as the same name of one of those dimensions.

As an example, output files result.nc issued by the GOTM model have this problem for coordinates z and zi :

dimensions:
    time = UNLIMITED ; // (4018 currently)
    lon = 1 ;
    lat = 1 ;
    z = 218 ;
    zi = 219 ;
variables:
    ... 
    float z(time, z, lat, lon) ;
    float zi(time, zi, lat, lon) ;

It has been proposed here to implement a 'rename_var' kwarg to xr.open_dataset() as a work-around, but it hasn't been implement yet, to my knowledge.

The quick workaround I use is to call nco-ncrename from python, where needed.

In my case :

 os.system('ncrename -v z,z_coord -v zi,zi_coord result.nc resultxr.nc')

This allows

 r2 = xr.open_dataset(testdir+'resultxr.nc')

while

 r = xr.open_dataset(testdir+'result.nc')

was failing.

Upvotes: 6

Related Questions