Reputation: 391
I'm trying to fill nan values in a NetCDF file (let's call is 'Target' file) by getting the values from another NetCDf file ('Source' file). [the two example files can be downloaded from here] I was thinking of doing this in python using the following framework:
Step1- identifying the nan values in the Target file, and extracting the location (lat/long), storing in a dataframe
Step2- Extracting the corresponding values of the stored lat/long from the Source file
Step3- writing these values into the Target file
I came up with the following code:
import pandas as pd
import xarray as xr
import numpy as np
Source = xr.open_dataset("Source.nc")
Target = xr.open_dataset("Target.nc")
#Step 1
df = Target.to_dataframe()
df=df.reset_index()
df2=(df.loc[df['ET'].isin([32767,'nan'])])
#Step2
lat = df2["lat"]
lon = df2["lon"]
point_list = zip(lat,lon)
Newdf = pd.DataFrame([])
for i, j in point_list:
dsloc = Source.sel(lat=i,lon=j,method='nearest')
DT=dsloc.to_dataframe()
Newdf=Newdf.append(DT,sort=True)
there are three issues with that: 1- I don’t know how to do step three
2- The second step take forever to complete as perhaps there are many missing points
3- This is just for one time step! Using the two files.
So, I believe there might be better ways, easier and faster to do this in python or cdo/Nco… Any ideas and solutions are welcomed…thank you… Note that, the two NC files are in different spatial resolution (dimensions).
Upvotes: 2
Views: 959
Reputation: 6444
You can use Xarray's where
method for this. You really want to stay away from a python for loop if you are concerned with efficiency at all. Here's an example of how this would work:
# these are the points you want to keep
# you can fine tune this further (exclude values over a threshold)
condition = target.notnull()
# fill the values where condition is false
target_filled = target.where(condition, source)
Upvotes: 2