Reputation: 359
There are different methods to calculate distance between two vectors of the same length: Euclidean, Manhattan, Hamming ...
I'm wondering about any method that would calculate distance between vectors of different length.
Upvotes: 18
Views: 68571
Reputation: 841
There is no unique definition of distance if you mix vectors of differing number of elements ("length", "dimensionality"); usually, people just map vectors of one space to the other.
There are many ways to do these mappings:
These are just the most common possibilities one could come up with; I'm sure there are more.
Choose the one that makes most sense to your application, there is no single "right" way to do it.
Upvotes: 3
Reputation: 106
The idea of padding the short-sized array with zeros to have the same length like the long-sized array doesn't seem "generally" a correct idea.
For example, if we have two sets (arrays, vectors,...) of measurements for the same parameter (e.g. temperature, speed or a binary parameter as the status of an on/off switch) made at different time instants. Assume that the first set A1 consists of N measurements made at a set of instants T1 whereas the second set A2 consists of M measurements (M~=N) taken at a set of instants T2.
Please note that the distribution of T2 arbitrarily differs from that of T1. Thus, padding with zeros here doesn't make sense.
In this case, I suggest to use interpolation by using a common set of time instants , say T as follows:
A1_new = interpolate (T1, A1, T);
A2_new = interpolate (T2, A2, T);
where interpolate(x,y,xq) accepts the inputs as the variable x, the function y(x) and the query points xq. The 'interpolate' function returns the interpolated output y(xq).
Now, we can compare the same-size sets A1_new and A2_new by any suitable measure e.g. Euclidean distance.
Upvotes: 0
Reputation: 1
You can try to calculate the average minimum distance between two vectors p and q of dimensions n and m (n ~= m):
d = 1/n * sum_i=1:n ( min_j=1:m (p(i) - q(j))) + 1/m * sum_j=1:m (min_i=1:n (p(i) - q(j)))
Upvotes: 0
Reputation: 2086
The Euclidean distance formula finds the distance between any two points in Euclidean space.
A point in Euclidean space is also called a Euclidean vector.
You can use the Euclidean distance formula to calculate the distance between vectors of two different lengths.
For vectors of different dimension, the same principle applies.
Suppose a vector of lower dimension also exists in the higher dimensional space. You can then set all of the missing components in the lower dimensional vector to 0 so that both vectors have the same dimension. You would then use any of the mentioned distance formulas for computing the distance.
For example, consider a 2-dimensional vector A
in R²
with components (a1,a2)
, and a 3-dimensional vector B
in R³
with components (b1,b2,b3)
.
To express A
in R³
, you would set its components to (a1,a2,0)
. Then, the Euclidean distance d
between A
and B
can be found using the formula:
d² = (b1 - a1)² + (b2 - a2)² + (b3 - 0)²
d = sqrt((b1 - a1)² + (b2 - a2)² + b3²)
For your particular case, the components will be either 0
or 1
, so all differences will be -1
, 0
, or 1
. The squared differences will then only be 0
or 1
.
If you're using integers or individual bits to represent the components, you can use simple bitwise operations instead of some arithmetic (^
means XOR
or exclusive or
):
d = sqrt(b1 ^ a1 + b2 ^ a2 + ... + b(n-1) ^ a(n-1) + b(n) ^ a(n))
And we're assuming the trailing components of A
are 0
, so the final formula will be:
d = sqrt(b1 ^ a1 + b2 ^ a2 + ... + b(n-1) + b(n))
Upvotes: 11