Yunter
Yunter

Reputation: 284

Comparing lists with text files

I have the following list: t = ['one', 'two', 'three']

I want to read a file and add a point for every word that exists in the list. E.g. if "one" and "two" exists in "CV.txt", points = 2. If all of them exist, then points = 3.

import nltk
from nltk import word_tokenize

t = ['one', 'two', 'three']
CV = open("cv.txt","r").read().lower()

points = 0

for words in t:
    if words in CV:
        #print(words)
        words = nltk.word_tokenize(words)
        print(words)
        li = len(words)
        print(li)
        points = li
        print(points)

Assuming 'CV.txt' contains the words "one" and "two", and it is split by words (tokenized), 2 points should be added to the variable "points"

However, this code returns:

['one']
1
1
['two']
1
1

As you can see, the length is only 1, but it should be 2. I'm sure there's a more efficient way to to this with iterating loops or something rather than len. Any help with this would be appreciated.

Upvotes: 3

Views: 812

Answers (1)

niraj
niraj

Reputation: 18218

I don't think you need to tokenize within loop, so may be easier way to do it would be as following:

  • First tokenize the words in txt file
  • Check each word that is common in t

And finally the points would be number of words in common_words.

import nltk
from nltk import word_tokenize

t = ['one', 'two', 'three']
CV = open("untitled.txt","r").read().lower()

points = 0

words = nltk.word_tokenize(CV)
common_words = [word for word in words if word in t]
points = len(common_words)

Note: if you want to avoid duplicates then, you need set of common words as following in above code:

common_words = set(word for word in words if word in t)

Upvotes: 3

Related Questions