rocx
rocx

Reputation: 285

NLP: Validate a sentence against a given grammar

I have a corpus of English sentences

sentences = [
    "Mary had a little lamb.",
    "John has a cute black pup.",
    "I ate five apples."
]

and a grammar (for the sake of simplicity)

grammar = ('''
    NP: {<NNP><VBZ|VBD><DT><JJ>*<NN><.>} # NP
    ''')

I wish to filter out the sentences which don't conform to the grammar. Is there a built-in NLTK function which can achieve this? In the above example, first two sentences follow the pattern of my grammar, but not the last one.

Upvotes: 2

Views: 810

Answers (2)

alvas
alvas

Reputation: 121992

TL;DR

Write a grammar, check that it parses, iterate through the subtrees and look for the non-terminals you're looking for, e.g. NP

See:

Code:

import nltk

grammar = ('''
    NP: {<NNP><VBZ|VBD><DT><JJ>*<NN><.>} # NP
    ''')

sentences = [
    "Mary had a little lamb.",
    "John has a cute black pup.",
    "I ate five apples."
]

def has_noun_phrase(sentence):
    parsed = chunkParser.parse(pos_tag(word_tokenize(sentence)))
    for subtree in parsed:
        if type(subtree) == nltk.Tree and subtree.label() == 'NP':
            return True
    return False

chunkParser = nltk.RegexpParser(grammar)
for sentence in sentences:
    print(has_noun_phrase(sentence))

Upvotes: 1

Giang Nguyen
Giang Nguyen

Reputation: 488

NLTK supports POS tagging, you can firstly apply POS tagging to your sentences, and then compare with the pre-defined grammar. Below is an example of using NLTK POS tagging.

enter image description here

Upvotes: 0

Related Questions