mrgou
mrgou

Reputation: 2510

Convert netCDF files to csv

I'm struggling to convert several Berkeley Earth netCDF files into CSV or another tabular format. I realize similar questions have been raised before, but I couldn't apply any of the solutions I came across.

For instance, this dataset.

In [1]: import xarray as xr

In [2]: import pandas as pd

In [3]: nc = xr.open_dataset('Complete_TAVG_Daily_EqualArea.nc')

In [4]: nc
Out[4]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, time: 50769)
Dimensions without coordinates: map_points, time
Data variables:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
    date_number  (time) float64 ...
    year         (time) float64 ...
    month        (time) float64 ...
    day          (time) float64 ...
    day_of_year  (time) float64 ...
    land_mask    (map_points) float64 ...

In [5]: df = nc.to_dataframe()
---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
(...)

MemoryError: Unable to allocate 532. MiB for an array with shape (279127962,) and data type int16

I must be missing something. Can someone help me?

Upvotes: 2

Views: 6820

Answers (2)

titusjan
titusjan

Reputation: 5546

What you are missing is that netCDF is a much more sophisticated format than CSV. A netCDF file can contain multiple arrays of any shape and size. A CSV file can only contain a single array of maximal 2 dimensions (or a set of 1D arrays if they all have the same length). You therefore cannot simply convert any netCDF file to CSV.

Let's look at the example file you gave. I repeat the info here with my version of Xarray, which seems to be a bit more verbose...

In [16]: ds = xr.open_dataset('Complete_TAVG_EqualArea.nc')

In [17]: ds
Out[17]:
<xarray.Dataset>
Dimensions:      (map_points: 5498, month_number: 12, time: 3240)
Coordinates:
    longitude    (map_points) float32 ...
    latitude     (map_points) float32 ...
  * time         (time) float64 1.75e+03 1.75e+03 1.75e+03 ... 2.02e+03 2.02e+03
Dimensions without coordinates: map_points, month_number
Data variables:
    land_mask    (map_points) float64 ...
    temperature  (time, map_points) float32 ...
    climatology  (month_number, map_points) float32 ...
Attributes:
    Conventions:          Berkeley Earth Internal Convention (based on CF-1.5)
    title:                Native Format Berkeley Earth Surface Temperature An...
    history:              16-Jan-2020 06:51:38
    institution:          Berkeley Earth Surface Temperature Project
    source_file:          Complete_TAVG.50985s.20200116T064041.mat
    source_history:       13-Jan-2020 17:22:52
    source_data_version:  ca6f26341938dae0ea7dd619bce6f15e
    comment:              This file contains Berkeley Earth surface temperatu...

There are three data variables (land_mask, temperature, climatology), plus three coordinate vectors (longitude, latitude, time). Perhaps you can include the coordinate vectors as the first row and column of a CSV file but even then this means you need at least three separate CSV files per netCDF file.

So for example for the climatology data frame you could write to CSV as follows:

In [31]: clim = ds['climatology']  

In [32]: clim.to_pandas().to_csv('clim.csv') 

So clim is an xarray.DataFrame which, in principle, can be written to a CSV file. Unfortunately the xarray.DataFrame class does not have a to_csv method. However the pandas.DataFrame class does, so we first convert it to a pandas data frame. Look at its parameter documentation here to tweak the generated output file.

Upvotes: 7

Kyle L
Kyle L

Reputation: 11

You can convert a .nc to .csv using the CDO package suite.

Example code (you would need to edit some of the outputtab parameters:

cdo -outputtab,date,lon,lat,value infile.nc | awk 'FNR==1{ row=$2","$3","$4","$5;print row  } FNR!=1{ row=$1","$2","$3","$4; print row}' > outfile.csv

Upvotes: 1

Related Questions