user1566200
user1566200

Reputation: 1838

SciPy KDTree distance units?

Let's say I have and array, where column 1 is in feet, column 2 is in feet, and column 3 is in seconds. For example:

x = [50 40 30]

I then have another array, y, with the same units and same number of columns, but many rows. I then turn it into a KDTree with Scipy:

tree = scipy.KDTree(y)

and then query that tree:

distance,index = tree.query(x,k=1)

By default, I believe the distance is calculated based on the Euclidean norm.

So for example, distance might be:

print distance
[34]

What units are these? Are they still in the original feet, feet, & seconds?

Upvotes: 0

Views: 1503

Answers (1)

Adam Acosta
Adam Acosta

Reputation: 603

It doesn't return any interpretable unit when the measurements are of things in which units can't be converted to each other (time and distance, for example). It's returning sqrt(feet**2 + feet**2 + sec**2), which is not a unit of measure. It's the Euclidean norm, but over an abstract space in this case.

This isn't really a Python question, by the way. scipy is just manipulating the numbers you give it and doesn't know the units. It's more a question of how to interpret math, for instance, if you want to think of a 5' x 5' box as 'closer' to a 7' x 7' box than a 6' x 6' box because you happened to measure them within seconds of each other and measured the third box hours later. Only you know your data and what features really count for building a similarity score. In the case I just gave, it doesn't make sense. If you're ranking similarity of sprinters based on both body size and best 100m time, then it probably makes sense.

Upvotes: 2

Related Questions