Convert netCDF files to csv

Question

I'm struggling to convert several Berkeley Earth netCDF files into CSV or another tabular format. I realize similar questions have been raised before, but I couldn't apply any of the solutions I came across.

For instance, this dataset.

ncdump from the netCDF utilities does not appear to generate an actual CSV file. I couldn't find any instruction on how to do so.
I've tried loading the data into a pandas dataframe with xarray.to_dataframe(), but my notebook cannot allocate the required memory.

In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: nc = xr.open_dataset('Complete_TAVG_Daily_EqualArea.nc')

In [4]: nc
Out[4]:

Dimensions:      (map_points: 5498, time: 50769)
Dimensions without coordinates: map_points, time
Data variables:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
    date_number  (time) float64 ...
    year         (time) float64 ...
    month        (time) float64 ...
    day          (time) float64 ...
    day_of_year  (time) float64 ...
    land_mask    (map_points) float64 ...

In [5]: df = nc.to_dataframe()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
(...)

MemoryError: Unable to allocate 532. MiB for an array with shape (279127962,) and data type int16

I've tried converting with Panoply. CSV export appears to work only to export a single variable (which I'd like to see as a column) into a single-line file.

I must be missing something. Can someone help me?

titusjan · Accepted Answer

What you are missing is that netCDF is a much more sophisticated format than CSV. A netCDF file can contain multiple arrays of any shape and size. A CSV file can only contain a single array of maximal 2 dimensions (or a set of 1D arrays if they all have the same length). You therefore cannot simply convert any netCDF file to CSV.

Let's look at the example file you gave. I repeat the info here with my version of Xarray, which seems to be a bit more verbose...

In [16]: ds = xr.open_dataset('Complete_TAVG_EqualArea.nc')

In [17]: ds
Out[17]:

Dimensions:      (map_points: 5498, month_number: 12, time: 3240)
Coordinates:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
  * time         (time) float64 1.75e+03 1.75e+03 1.75e+03 ... 2.02e+03 2.02e+03
Dimensions without coordinates: map_points, month_number
Data variables:
    land_mask    (map_points) float64 ...
    temperature  (time, map_points) float32 ...
    climatology  (month_number, map_points) float32 ...
Attributes:
    Conventions:          Berkeley Earth Internal Convention (based on CF-1.5)
    title:                Native Format Berkeley Earth Surface Temperature An...
    history:              16-Jan-2020 06:51:38
    institution:          Berkeley Earth Surface Temperature Project
    source_file:          Complete_TAVG.50985s.20200116T064041.mat
    source_history:       13-Jan-2020 17:22:52
    source_data_version:  ca6f26341938dae0ea7dd619bce6f15e
    comment:              This file contains Berkeley Earth surface temperatu...

There are three data variables (land_mask, temperature, climatology), plus three coordinate vectors (longitude, latitude, time). Perhaps you can include the coordinate vectors as the first row and column of a CSV file but even then this means you need at least three separate CSV files per netCDF file.

So for example for the climatology data frame you could write to CSV as follows:

In [31]: clim = ds['climatology']  

In [32]: clim.to_pandas().to_csv('clim.csv')

So clim is an xarray.DataFrame which, in principle, can be written to a CSV file. Unfortunately the xarray.DataFrame class does not have a to_csv method. However the pandas.DataFrame class does, so we first convert it to a pandas data frame. Look at its parameter documentation here to tweak the generated output file.

Convert netCDF files to csv

Answers (2)

Related Questions