Reputation: 284
I have the following list: t = ['one', 'two', 'three']
I want to read a file and add a point for every word that exists in the list. E.g. if "one"
and "two"
exists in "CV.txt
", points = 2. If all of them exist, then points = 3.
import nltk
from nltk import word_tokenize
t = ['one', 'two', 'three']
CV = open("cv.txt","r").read().lower()
points = 0
for words in t:
if words in CV:
#print(words)
words = nltk.word_tokenize(words)
print(words)
li = len(words)
print(li)
points = li
print(points)
Assuming 'CV.txt'
contains the words "one
" and "two
", and it is split by words (tokenized), 2 points should be added to the variable "points
"
However, this code returns:
['one']
1
1
['two']
1
1
As you can see, the length is only 1, but it should be 2. I'm sure there's a more efficient way to to this with iterating loops or something rather than len. Any help with this would be appreciated.
Upvotes: 3
Views: 812
Reputation: 18218
I don't think you need to tokenize within loop, so may be easier way to do it would be as following:
t
And finally the points would be number of words in common_words
.
import nltk
from nltk import word_tokenize
t = ['one', 'two', 'three']
CV = open("untitled.txt","r").read().lower()
points = 0
words = nltk.word_tokenize(CV)
common_words = [word for word in words if word in t]
points = len(common_words)
Note: if you want to avoid duplicates then, you need set of common words as following in above code:
common_words = set(word for word in words if word in t)
Upvotes: 3