What's the best way to find the similarity among these vectors?

Question

v1 = [33, 24, 55, 56]
v2 = [32, 25, 51, 40]
v3 = [ ... ]
v4 = [ ... ]

Normally, to find which vector is the most similar to v1, I would run v1 against the other vectors with a cosine similarity algorithm.

Now, I have a more complex set of vectors with the structure:

v1 = [ { 'a': 4, 'b':9, 'c': 12 ... },
       { 'a', 3, 'g':3, 'b': 33 ... },
       { 'b', 1, 'k': 6, 'n': 19 ... },
       ...
     ]
v2 = [ {}, {}, {} ... ]
v3 = [ {}, {}, {} ... ]
v4 = [ {}, {}, {} ... ]

Given this structure, how would you calculate similarity? (A good match would be a vector with many keys similar to v1, with values of those keys very similar as v1's values)

btilly's answer:

def cosine_sim_complex(v, w):
    '''
    Complex version of cosine similarity
    '''
    def complicated_dot(v, w):
        dot = 0
        for (v_i, w_i) in zip(v, w):
            #{ _, _ }, {_, _}
            for x in v_i:
                if x in w_i:
                    dot += v_i[x] * w_i[x]
        return float(dot)
    length_v = float(complicated_dot(v, v) ** 0.5)
    length_w = float(complicated_dot(w, w) ** 0.5)
    score = complicated_dot(v, w) /  length_v / length_w
    return score


v1 = [ {'a':44, 'b':21 }, { 'a': 55, 'c': 22 } ]
v2 = [ {'a':99, 'b':21 }, { 'a': 55, 'c': 22 } ]
cosine_sim_complex(v1, v2)
1.01342687531

btilly · Accepted Answer

You do the same thing in more dimensions.

Previously you just had 4 dimensions. Now you have a much larger set of dimensions with 2-dimensional labeling of the indices. But the math remains the same. You have a dot product like this untested code:

def complicated_dot(v, w):
    dot = 0
    for (v_i, w_i) in zip(v, w):
        for x in v_i.iterkeys():
            if x in w_i:
                dot += v_i[x] * w_i[x]
    return dot

And then you can apply the cosine similarity algorithm that you already know.

What's the best way to find the similarity among these vectors?

Answers (2)

Related Questions

What&#39;s the best way to find the similarity among these vectors?

Answers (2)

Related Questions

What's the best way to find the similarity among these vectors?