cosine similarity and sentences

Question

So I'm trying to do a cosine similarity with a text file I have. https://lms.uwa.edu.au/bbcswebdav/pid-1143173-dt-content-rid-16133365_1/courses/CITS1401_SEM-2_2018/CITS1401_SEM-2_2018_ImportedContent_20180713092326/CITS1401_SEM-1_2018/Unit%20Content/Resources/Project2_2018/sample.txt

I'm wondering how I print this sentence by sentence and not readline() to read line by line. I'm trying to create the sentence variables. For example

s1 = "the mississippi is well worth reading about"
s2 = "it is not a commonplace river, but on the contrary is in all ways remarkable"

Is this first the way to go about it? If it is, my next step which I know how to do is remove the common words from the sentences and only leave unique words to compare with.

How do I stop at the full-stop and then store that sentence to a variable who looping through the text?

Thanks

John · Accepted Answer

Do you mean this:

with open("file.txt",'r') as in_f:
  sentences = in_f.read().replace('
','').split('.')
  for each s in sentences:
     #your code

cosine similarity and sentences

Answers (1)

Related Questions