sagar .rao
sagar .rao

Reputation: 11

finding the similarity between two lists. By giving sepearte weights based on the position of values in the lists

I'm trying to find the similarity value of a list when compared to another list. Like finding the jaccard similarity value for a sentence. But the only difference here is if the value is in same index in both the lists then it get's a static weight else it's weight penalizes based on how many places it is away from that index.

a=["are","you","are","you","why"]
b=['you',"are","you",'are',"why"]
li=[]
va=[]
fi=[]
weightOfStatic=1/len(a)
for i in range(len(a)):    
    if a[i]==b[i]:
    print("true1", weightOfStatic,a[i],b[i])
    fi.append({"static":i, "dynamic":i,"Weight":weightOfStatic})
    li.append([weightOfStatic,a[i],b[i]])
    va.append(li)
else:
     for j in range(len(b)):
         if a[i]==b[j]:
         weightOfDynamic = weightOfStatic*(1-(1/len(b))*abs(i-j))
         fi.append({"static":i, "dynamic":j,"Weight":weightOfDynamic})
         print("true2 and index diiference between words =%d"% abs(i-j),weightOfDynamic, i,j)
         li.append([weightOfDynamic,a[i],b[j]])
         va.append(weightOfDynamic)

sim_value=sum(va)
print("The similarity value is = %f" %(sim_value))

The following code works well when it don't have repeated words.
like a=["how","are","you"] b=["you","are","how"]. here for this senetnce it gives 0.5 similarity value

The expected result for the above example will be between both the lists A and B. the value from the list A should take its nearest index in B if it has repeated words. This is how the matching is done for aboe example with code given

      {'static': 0, 'dynamic': 1, 'Weight': 0.160}
 here 0 should not match with 3 again
      {'static': 0, 'dynamic': 3, 'Weight': 0.079}
      {'static': 1, 'dynamic': 0, 'Weight': 0.160}
 same for 1 and 2
      {'static': 1, 'dynamic': 2, 'Weight': 0.160}
 dynamic 1 is already overhere 
      {'static': 2, 'dynamic': 1, 'Weight': 0.160}
      {'static': 2, 'dynamic': 3, 'Weight': 0.160}
 dynamic 0 is already over
      {'static': 3, 'dynamic': 0, 'Weight': 0.079}
      {'static': 3, 'dynamic': 2, 'Weight': 0.160}
      [0.2, 'why', 'why'] 

the weight here is 1.3200 (the weight will be from 0 to 1)

Instead the result should be

      {'static': 0, 'dynamic': 1, 'Weight': 0.160}
      {'static': 1, 'dynamic': 0, 'Weight': 0.160}
      {'static': 2, 'dynamic': 3, 'Weight': 0.160}
      {'static': 3, 'dynamic': 2, 'Weight': 0.160}
      [0.2, 'why', 'why'] 

the total weight would be 0.84

Upvotes: 0

Views: 158

Answers (1)

analphagamma
analphagamma

Reputation: 76

First of all I "prettified" your code to look more Pythonic. :) I think you over-complicated it a bit. Actually, it didn't even run for me because you tried to sum a list that had ints and lists in it.

a = ['are','you','are','you','why']
b = ['you','are','you','are','why']

total_weight = 0
weight_of_static = 1/len(a)
for i, a_word in enumerate(a):
    if a_word == b[i]:
        print('{0} <-> {1} => static\t\t// weight: {2:.2f}'.format(a_word, b[i], weight_of_static))
        total_weight += weight_of_static
    else:
        distances = []
        for j, b_word in enumerate(b):
            if a_word == b_word:
                distances.append(abs(i - j))

        dynamic_weight = weight_of_static*(1 - ( 1 / len(b)) * min(distances))
        total_weight += dynamic_weight
        print('{0} <-> {1} => not static\t// weight: {2:.2f}'.format(a_word, b[i], dynamic_weight))

print('The similarity value is = {0:.2f}'.format(total_weight))
  • So first I declare a total_weight variable to track the weight.
    Then I make good use of the enumerate function so I can have the index and the element.
  • If the 2 words are the same at the same index it's straightforward :)
  • If not, then we loop through the second list as well as you did but we have to keep track of the matches in the distance variable because a[3] would match b[0] instead of b[2] which is closer.
  • After that we just use your formula to calculate the dynamic weight (I left it a little verbose so you can see it more clearly). The only difference is that we use the smallest distance (min(distance))

This is my sample output:

$ python similarity.py
are <-> you => not static       // weight: 0.16
you <-> are => not static       // weight: 0.16
are <-> you => not static       // weight: 0.16
you <-> are => not static       // weight: 0.16
why <-> why => static           // weight: 0.20
The similarity value is = 0.84   

I hope this helps.

Upvotes: 1

Related Questions