Reputation: 11
def ethos(file):
"""Open a local file, convert its content into tokens.Match tokens with list provided, return matching list."""
f = open(file)
raw = f.read()
tokens = nltk.word_tokenize(raw)
list = [ 'perfect' ,'companion' , 'good' , 'brilliant', 'good']
for tokens in list:
return tokens
I wrote this code with the idea that it should return all the tokens in the text which matches the list defined, but it is returning only one token and that too the one in the beginning of the list I also tried to add and empty list and append the matching words but it doesn't seems to work, so kindly let me know if any body has any ideas, please reply soon
Upvotes: 1
Views: 57
Reputation: 17168
There are a few issues here, but the main point is that a function will only execute the first return
it comes across. So you loop through each item in the list, and return
the first one--at which point the function stops executing, because it returned.
I think what you want is to check each word in the text to see whether it's in your list, and then return all the matching words. To do that, you need to actually perform a comparison somewhere, which you're not doing at the moment. You might rewrite your loop to look something like this:
# Don't use "list" as a variable name! Also, there's no need for two "good" entries.
words_to_match = ['perfect' ,'companion' , 'good' , 'brilliant']
matching_tokens = []
for token in tokens:
if token in words_to_match:
matching_tokens.append(token) # add the matching token to a list
return matching_tokens # finally, return all the tokens that matched
Once you understand what it is that you're doing with the explicit loop above, note that you can rewrite the whole thing as a simple list comprehension:
words_to_match = {'perfect' ,'companion' , 'good' , 'brilliant'} # using a set instead of a list will make the matching faster
return [t for t in tokens if t in words_to_match]
Upvotes: 1
Reputation: 117876
I think you meant to do
return [i for i in tokens if i in list]
The way you wrote it, it will iterate through each word in list
. But the first thing it does in the loop is return
. So all it will do is return the word 'perfect'
every time regardless of what comes back in tokens
. So the modified code (assuming everything else functions correctly) would be
def ethos(file):
"""Open a local file, convert its content into tokens.Match tokens with list
provided, return matching list."""
f = open(file)
raw = f.read()
tokens = nltk.word_tokenize(raw)
list = [ 'perfect' ,'companion' , 'good' , 'brilliant', 'good']
return [i for i in tokens if i in list]
Also, some miscellaneous tips:
- Don't name that variable
list
because you are name shadowing- Your variable
list
could be aset
then you could have O(1) lookup times instead of O(N)
Upvotes: 1