Reputation: 987
I have a large number (200+) of netCDF files which are indexed by date/time and contain 3 hourly measurements of precipitation for a single location, covering 20 years, a short example is shown below.
ppt latitude longitude
time
2017-03-01 00:00:00 0.00 16.625 -62.375
2017-03-01 03:00:00 0.00 16.625 -62.375
2017-03-01 06:00:00 0.00 16.625 -62.375
2017-03-01 09:00:00 0.00 16.625 -62.375
2017-03-01 12:00:00 0.00 16.625 -62.375
2017-03-01 15:00:00 0.00 16.625 -62.375
Each file contains a month's worth of data. My aim is to concatenate all of these files into one containing all the data for 20 years. So far I have deduced that a potential way forward is to extract the data from each netCDF file and put them into a dataframe:
import xarray as xr
import pandas as pd
ds = xr.open_dataset('ppt_1_201703.nc')
df = ds.to_dataframe()
If I had a small number of files, using concat([df, df2, df3]) would suffice and I would extract data from each netCDF file manually. However, for such a large number of files, this approach would be time consuming to say the least.
My thoughts so far amount to believing that the best approach would be a for loop that cycles through each file according to its name and produces a dataframe for each. I would then need another for loop to concatenate each dataframe.
I am stuggling with how to construct these loops. The file names are like this:
ppt_1_199801.nc
ppt_1_199802.nc
ppt_1_199803.nc
...
ppt_1_201610.nc
ppt_1_201611.nc
ppt_1_201612.nc
Are there any ideas out there? Sorry if the answer is easy (I am quite new to python), but I could not find anything that quite solved my problem elsewhere. Thanks!
Upvotes: 3
Views: 3645
Reputation: 6464
Xarray provides the open_mfdataset()
function that should the open and concatenate steps for you. In your case, you can simply do:
import xarray as xr
ds = xr.open_mfdataset('ppt_1_*.nc')
df = ds.to_dataframe()
# or
ds = xr.open_mfdataset([list_of_filenames])
df = ds.to_dataframe()
Either way, xarray will handle the open and concatenate steps within the open_mfdataset for you. More info in the xarray docs: http://xarray.pydata.org/en/latest/io.html#combining-multiple-files
Edit 1:
In the event you're dealing with many files (too many to keep open at once), you can use the autoclose=True
option within open_mfdataset
. That would look like:
ds = xr.open_mfdataset('ppt_1_*.nc', autoclose=True)
Upvotes: 6