Shawn
Shawn

Reputation: 639

NLTK: filter sentences with specific structures

How to filter sentences with specific structures using NLTK? For example, we have the following definition of a context free grammar:

  1. S → NP VP
  2. S → Aux NP VP
  3. S → VP
  4. NP → Pronoun
  5. NP → Proper-Noun
  6. NP → Det Nominal
  7. Nominal → Noun
  8. Nominal → Nominal Noun
  9. Nominal → Nominal PP
  10.VP → Verb
  11.VP → Verb NP
  12.VP → VP PP
  13.PP → Prep NP

As can be seen, three types of sentence structures are defined:

  1. S → NP VP
  2. S → Aux NP VP
  3. S → VP

Given the following sentence, I want to know if this sentence conforms to any of the above three sentence structures.

I am not much for country music but it has the potential for beauty, with its combined inclusions of comedy and sadness.

My question is, how should I do it using NLTK?

Upvotes: 2

Views: 1863

Answers (1)

Slater Victoroff
Slater Victoroff

Reputation: 21914

http://www.nltk.org/book/ch05.html

Should explain everything you need for this. Basically you have to first tokenize the sentence (break it up into individual tokens) and then they are tagged with the corresponding PoS that nltk identifies them as.

This returns a list of tuples, and then there are any number of ways to compare those tuples to the ones in your grammar.

Specific code to guard against faulty future links:

>>> text = word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

Upvotes: 2

Related Questions