Reputation: 21
I'm trying to interpolate some data from air monitoring stations.
Almost each record have an air quality value and their latitude, longitude. But there are some records lacking values. For example, the data just like this:
116° 42° 10
117° 43° missing
120° 20° 1000
I want to use the scikit-learn's GPR (GaussianProcessRegressor) to interpolate the missing values.
I know that the 2-D data can be processed like the last answer in this problem Python - Kriging (Gaussian Process) in scikit_learn
My problem is that: I shouldn't use the latitude and longitude to do this task directly, beacause the earth is a sphere so the latitude/longitude is not an usual flat 2-D grid.
I want to ask how to define the distance function between points when using the scikit-learn's GPR, or should I just project these lat/lon points to flat and use them? I hadn't try this because the precesion loss during projection made me sad :(
Thx for any suggestion :)
ps. The distance between two lat/lon points can be calculate by Haversine formula like Calculate distance between two latitude-longitude points? (Haversine formula)
Upvotes: 1
Views: 1122
Reputation: 5684
The Variogram() constructor has a parameter dist_func='euclidean'. You could try replacing euclidean with a custom haversine function.
See scipy.spatial.distance.pdist
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
dm = pdist(X, lambda u, v: np.sqrt(((u-v)**2).sum()))
I haven't tried it.
Upvotes: 0