Reputation: 639
How to filter sentences with specific structures using NLTK? For example, we have the following definition of a context free grammar:
1. S → NP VP
2. S → Aux NP VP
3. S → VP
4. NP → Pronoun
5. NP → Proper-Noun
6. NP → Det Nominal
7. Nominal → Noun
8. Nominal → Nominal Noun
9. Nominal → Nominal PP
10.VP → Verb
11.VP → Verb NP
12.VP → VP PP
13.PP → Prep NP
As can be seen, three types of sentence structures are defined:
1. S → NP VP
2. S → Aux NP VP
3. S → VP
Given the following sentence, I want to know if this sentence conforms to any of the above three sentence structures.
I am not much for country music but it has the potential for beauty, with its combined inclusions of comedy and sadness.
My question is, how should I do it using NLTK?
Upvotes: 2
Views: 1863
Reputation: 21914
http://www.nltk.org/book/ch05.html
Should explain everything you need for this. Basically you have to first tokenize the sentence (break it up into individual tokens) and then they are tagged with the corresponding PoS that nltk identifies them as.
This returns a list of tuples, and then there are any number of ways to compare those tuples to the ones in your grammar.
Specific code to guard against faulty future links:
>>> text = word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
Upvotes: 2