SHV_la
SHV_la

Reputation: 987

Python: loop to concatenate multiple (200+) netCDF files to form one file

I have a large number (200+) of netCDF files which are indexed by date/time and contain 3 hourly measurements of precipitation for a single location, covering 20 years, a short example is shown below.

                        ppt     latitude    longitude
time            
2017-03-01 00:00:00     0.00    16.625      -62.375
2017-03-01 03:00:00     0.00    16.625      -62.375
2017-03-01 06:00:00     0.00    16.625      -62.375
2017-03-01 09:00:00     0.00    16.625      -62.375
2017-03-01 12:00:00     0.00    16.625      -62.375
2017-03-01 15:00:00     0.00    16.625      -62.375

Each file contains a month's worth of data. My aim is to concatenate all of these files into one containing all the data for 20 years. So far I have deduced that a potential way forward is to extract the data from each netCDF file and put them into a dataframe:

import xarray as xr
import pandas as pd

ds = xr.open_dataset('ppt_1_201703.nc')
df = ds.to_dataframe()

If I had a small number of files, using concat([df, df2, df3]) would suffice and I would extract data from each netCDF file manually. However, for such a large number of files, this approach would be time consuming to say the least.

My thoughts so far amount to believing that the best approach would be a for loop that cycles through each file according to its name and produces a dataframe for each. I would then need another for loop to concatenate each dataframe.

I am stuggling with how to construct these loops. The file names are like this:

ppt_1_199801.nc
ppt_1_199802.nc
ppt_1_199803.nc
...
ppt_1_201610.nc
ppt_1_201611.nc
ppt_1_201612.nc

Are there any ideas out there? Sorry if the answer is easy (I am quite new to python), but I could not find anything that quite solved my problem elsewhere. Thanks!

Upvotes: 3

Views: 3645

Answers (1)

jhamman
jhamman

Reputation: 6464

Xarray provides the open_mfdataset() function that should the open and concatenate steps for you. In your case, you can simply do:

import xarray as xr

ds = xr.open_mfdataset('ppt_1_*.nc')
df = ds.to_dataframe()

# or
ds = xr.open_mfdataset([list_of_filenames])
df = ds.to_dataframe()

Either way, xarray will handle the open and concatenate steps within the open_mfdataset for you. More info in the xarray docs: http://xarray.pydata.org/en/latest/io.html#combining-multiple-files

Edit 1:

In the event you're dealing with many files (too many to keep open at once), you can use the autoclose=True option within open_mfdataset. That would look like:

ds = xr.open_mfdataset('ppt_1_*.nc', autoclose=True)

Upvotes: 6

Related Questions