What am I missing when getting nouns from sentence and reversed sentence using nltk?

Question

I Have a is_noun definition using nltk:

is_noun = lambda pos: pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'

then I have this in a function:

def test(text):
    tokenized = nltk.word_tokenize(text)
    nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if is_noun(pos)]  
    print ('Nouns:', nouns)
    return nouns

then I call the function:

test('When will this long and tedious journey ever end? Like all good')

and get:

Nouns: ['journey']

then call same function but with reversed sentence and get:

test('good all Like end? ever journey tedious and long this will When')

results:

  Nouns: ['end']

I am expecting to get same amount of nouns but that is not the case. What am I doing wrong?

Prune · Accepted Answer

Summary: GIGO (Garbage In => Garbage Out).

As the comment suggests, word order matters. English is rife with words that can act as multiple parts of speech, depending on placement within a phrase. Consider:

You can cage a swallow.
You cannot swallow a cage.

In the second text you present, you do not have a legal sentence by any means. The best the English parser can determine is that "end" may be the direct object of the verb "like", and is therefore a noun in this case. Similarly, "journey" appears to be the main verb of the second sequence of words.

What am I missing when getting nouns from sentence and reversed sentence using nltk?

Answers (1)

Summary: GIGO (Garbage In => Garbage Out).

Related Questions