Programmer
Programmer

Reputation: 37

Calculating cosine similarity from file vectors in Python

I would like to calculate cosine similarity between two vectors in the a file in the following format:

first_vector 1 2 3  
second_vector 1 3 5  

... simply the name of the vector and then its elements, separated by single space. I have defined a function which should take each line as individual list and then calculate the similarity. My problem is that I do not know I how to convert the two lines to two lists.

This is my code:

import math

def cosine_sim(vector1,vector2):

    sum_of_x,sum_of_y, sum_of_xy=0,0,0
    for i in range(len(v1)):
        x=vector1[i]; y=vector2[i]
        sum_of_x+=x*x;
        sum_of_y+=y*y;
        sum_of_xy += x*y
    return (sum_of_xy/math.sqrt(sum_of_x*sum_of_y))


myfile=open("vectors","r")
v1='#This should read the first line vector which is 1 2 3'
v2='#This should read the second line vector which is 1 3 5'
print("The similarity is",cosine_sim(v1,v2))

Upvotes: 0

Views: 567

Answers (1)

Prune
Prune

Reputation: 77847

These are basic data-manipulation skills that you're supposed to learn for this assignment. Here are the steps:

Read the entire line into a string.  # input()
Split the string on spaces.          # string.split()
Drop the first element.              # list slice or pop()
Convert the others to integer.       # int()

It's possible to stuff all of that into one line of code, but I recommend that you do it in four steps, testing each step as you code it. The last one may be a loop for you, depending on your current skill level.

Does that get you moving?


INPUT IN PAIRS

To handle pairs of input lines, y have to read and split them individually. Another way is to maintain a boolean flag to tell you whether the current iteration is the first or second line.

One way:

while not at end of file:    # I leave coding details to you
    line1 = myfile.readline().split(' ')[1:]
    line2 = myfile.readline().split(' ')[1:]
    # Convert both to numbers; compute cosine

Another way:

first = True
line in myfile.readlines():
    if first:
        line1 = myfile.readline().split(' ')[1:]
    else:
        line2 = myfile.readline().split(' ')[1:]
        # Convert both to numbers; compute cosine
        first = not first

Upvotes: 1

Related Questions