Reputation: 2510
I'm struggling to convert several Berkeley Earth netCDF files into CSV or another tabular format. I realize similar questions have been raised before, but I couldn't apply any of the solutions I came across.
For instance, this dataset.
ncdump
from the netCDF utilities does not appear to generate an actual CSV file. I couldn't find any instruction on how to do so.pandas
dataframe with xarray.to_dataframe()
, but my notebook cannot allocate the required memory.In [1]: import xarray as xr
In [2]: import pandas as pd
In [3]: nc = xr.open_dataset('Complete_TAVG_Daily_EqualArea.nc')
In [4]: nc
Out[4]:
<xarray.Dataset>
Dimensions: (map_points: 5498, time: 50769)
Dimensions without coordinates: map_points, time
Data variables:
longitude (map_points) float32 ...
latitude (map_points) float32 ...
date_number (time) float64 ...
year (time) float64 ...
month (time) float64 ...
day (time) float64 ...
day_of_year (time) float64 ...
land_mask (map_points) float64 ...
In [5]: df = nc.to_dataframe()
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
(...)
MemoryError: Unable to allocate 532. MiB for an array with shape (279127962,) and data type int16
Panoply
. CSV export appears to work only to export a single variable (which I'd like to see as a column) into a single-line file.I must be missing something. Can someone help me?
Upvotes: 2
Views: 6820
Reputation: 5546
What you are missing is that netCDF is a much more sophisticated format than CSV. A netCDF file can contain multiple arrays of any shape and size. A CSV file can only contain a single array of maximal 2 dimensions (or a set of 1D arrays if they all have the same length). You therefore cannot simply convert any netCDF file to CSV.
Let's look at the example file you gave. I repeat the info here with my version of Xarray, which seems to be a bit more verbose...
In [16]: ds = xr.open_dataset('Complete_TAVG_EqualArea.nc')
In [17]: ds
Out[17]:
<xarray.Dataset>
Dimensions: (map_points: 5498, month_number: 12, time: 3240)
Coordinates:
longitude (map_points) float32 ...
latitude (map_points) float32 ...
* time (time) float64 1.75e+03 1.75e+03 1.75e+03 ... 2.02e+03 2.02e+03
Dimensions without coordinates: map_points, month_number
Data variables:
land_mask (map_points) float64 ...
temperature (time, map_points) float32 ...
climatology (month_number, map_points) float32 ...
Attributes:
Conventions: Berkeley Earth Internal Convention (based on CF-1.5)
title: Native Format Berkeley Earth Surface Temperature An...
history: 16-Jan-2020 06:51:38
institution: Berkeley Earth Surface Temperature Project
source_file: Complete_TAVG.50985s.20200116T064041.mat
source_history: 13-Jan-2020 17:22:52
source_data_version: ca6f26341938dae0ea7dd619bce6f15e
comment: This file contains Berkeley Earth surface temperatu...
There are three data variables (land_mask, temperature, climatology), plus three coordinate vectors (longitude, latitude, time). Perhaps you can include the coordinate vectors as the first row and column of a CSV file but even then this means you need at least three separate CSV files per netCDF file.
So for example for the climatology
data frame you could write to CSV as follows:
In [31]: clim = ds['climatology']
In [32]: clim.to_pandas().to_csv('clim.csv')
So clim
is an xarray.DataFrame
which, in principle, can be written to a CSV file. Unfortunately the xarray.DataFrame
class does not have a to_csv
method. However the pandas.DataFrame
class does, so we first convert it to a pandas data frame. Look at its parameter documentation here to tweak the generated output file.
Upvotes: 7
Reputation: 11
You can convert a .nc to .csv using the CDO package suite.
Example code (you would need to edit some of the outputtab parameters:
cdo -outputtab,date,lon,lat,value infile.nc | awk 'FNR==1{ row=$2","$3","$4","$5;print row } FNR!=1{ row=$1","$2","$3","$4; print row}' > outfile.csv
Upvotes: 1