jdcaballerov
jdcaballerov

Reputation: 1462

How can one use KNeighborsRegressor with haversine metric?

Let's say I have data points with a number of features that includes lat, long coordinates.

I will like to use a KNeighborsRegressor using the metric "haversine" on the lat longs. How is data X_train, Y_train to be prepared for the regressor ? .

k_clf = KNeighborsRegressor(n_neighbors=num_neigh,weights=myweights,algorithm='ball_tree',metric='haversine')

clf.fit(X_train,Y_train)

Alternatively if I decide to write my own metric, the function receives a numpy ndarray with 10 values per point, How can I tell apart the lats and longs ?

Upvotes: 1

Views: 2233

Answers (1)

arthur
arthur

Reputation: 2399

From sklearn documentation : http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html

For the first part of your question : using haversine metric for KNN regression :

Metrics intended for two-dimensional vector spaces: Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.

So the first column of your X_train should be latitude and second column should be longitude.

Now for the second part of your question, if you want to define your own metric, then you can choose the format of your X_train. But remember that if you want to use the "ball_tree" alogrithm, your metric has to be a mathematical distance :

Non-negativity: d(x, y) >= 0
Identity: d(x, y) = 0 if and only if x == y
Symmetry: d(x, y) = d(y, x)
Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)

else you will only be able to use the "brute" algorithm.

Upvotes: 2

Related Questions