Pramod
Pramod

Reputation: 89

Split single monthly NetCDF file into daily averaged NetCDF multiple files using xarray

I have 1 NetCDF file for the month of September 2007. It contains 6 hourly data for certain lat/long with wind and humidity variables. Each variable is in a shape of (120, 45, 93): 120 times (4 times a day), 45 latitudes and 93 longitudes. With the following code, I am able to get daily average data for all variables. Now, each variable is of shape (30, 45, 93). Time is an integer and has a unit of 'hours since 1900-01-01 00:00:00.0'.

From this daily averaged data, how can I split into 30 different NetCDF files for each day, with the file name containing YYYY:MM:DD time format?

import xarray as xr
monthly_data = xr.open_dataset('interim_2007-09-01to2007-09-31.nc') 
daily_data = monthly_data.resample(time='1D').mean()

Upvotes: 3

Views: 3170

Answers (3)

ClimateUnboxed
ClimateUnboxed

Reputation: 8087

Just in case it helps anyone, it is also possible to perform this task of calculating the daily mean and dividing into separate daily files directly from the command line:

cdo splitday -daymean in.nc day

which produces a series of files day01.nc day02.nc ...

Upvotes: 1

jhamman
jhamman

Reputation: 6434

Xarray has a top level function for times like this - xarray.save_mfdataset. In your case, you would want to use groupby to break your dataset into logical chunks and then create a list of corresponding file names. From there, just let save_mfdataset do the rest.

dates, datasets = zip(*ds.resample(time='1D').mean('time').groupby('time'))
filenames = [pd.to_datetime(date).strftime('%Y.%m.%d') + '.nc' for date in dates]
xr.save_mfdataset(datasets, filenames)

Upvotes: 6

sam46
sam46

Reputation: 1271

After going through the documentation, you can use NetCDF4's num2date to convert an integer to a date. Also you can index xarray.dataset using isel():

from netCDF4 import num2date
for i in range(30):
    day = daily_data.isel(time=i)
    the_date = num2date(day.time.data, units='hours since 1900-01-01 00:00:00')
    day.to_netcdf(str(the_date.date())+'.nc', format='NETCDF4')

Upvotes: 2

Related Questions