Reputation: 1430
I have data that looks like the following (all are string values)
>>> all_states[0:3]
[['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
I want to use a custom distance metric
def mydist(x, y):
return 1
neigh = NearestNeighbors(n_neighbors=5, metric=mydist)
However, when I call
neigh.fit(np.array(all_states))
I get the error
ValueError: Unable to convert array of bytes/strings into decimal numbers with dtype='numeric'
I know that I can use the OneHotEncoder
or the LabelEncoder
- but can I also do that without encoding the data as I have my own distance metric?
Upvotes: 3
Views: 678
Reputation: 367
Also note that to use neigh.kneighbors
with metric='precomputed'
and custom query points, pass cdist(query_points, all_states)
to it (cdist doc).
For example,
from scipy.spatial.distance import cdist
... # initialize and fit `neigh` as in @StupidWolf's answer
print(neigh.kneighbors(cdist(query_points, all_states)))
Upvotes: 0
Reputation: 46968
On the help page,
metrics tr or callable, default=’minkowski’
The distance metric to usefor the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.
You can use pdist documentation and make it squareform as required for the input:
all_states = [['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
from scipy.spatial.distance import pdist,squareform
from sklearn.neighbors import NearestNeighbors
dm = squareform(pdist(all_states, mydist))
dm
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
neigh = NearestNeighbors(n_neighbors=5, metric="precomputed")
neigh.fit(dm)
Upvotes: 3
Reputation: 16
As far as I know, ML models need to be trained on numerical data. If your distance metric has a way to convert your strings to numbers, then it will work.
Upvotes: 0