Reputation: 25058
I Have a is_noun
definition using nltk
:
is_noun = lambda pos: pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'
then I have this in a function:
def test(text):
tokenized = nltk.word_tokenize(text)
nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]
print ('Nouns:', nouns)
return nouns
then I call the function:
test('When will this long and tedious journey ever end? Like all good')
and get:
Nouns: ['journey']
then call same function but with reversed sentence and get:
test('good all Like end? ever journey tedious and long this will When')
results:
Nouns: ['end']
I am expecting to get same amount of nouns but that is not the case. What am I doing wrong?
Upvotes: 0
Views: 40
Reputation: 77850
As the comment suggests, word order matters. English is rife with words that can act as multiple parts of speech, depending on placement within a phrase. Consider:
You can cage a swallow.
You cannot swallow a cage.
In the second text you present, you do not have a legal sentence by any means. The best the English parser can determine is that "end" may be the direct object of the verb "like", and is therefore a noun in this case. Similarly, "journey" appears to be the main verb of the second sequence of words.
Upvotes: 1