Reputation: 1111
I need to identify certain POS tags before/after certain specified words, for example the following tagged sentence:
[('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
can be abstracted to the form "would be" + Adjective
Similarly:
[('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
is of the form "am able to" + Verb
How can I go about checking for these type of a pattern in sentences. I am using NLTK.
Upvotes: 1
Views: 2305
Reputation: 16995
Assuming you want to check literally for "would" followed by "be", followed by some adjective, you can do this:
def would_be(tagged):
return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
The input is a POS tagged sentence (list of tuples, as per NLTK).
It checks if there are any three elements in the list such that "would" is next to "be" and "be" is next to a word tagged as an adjective ('JJ'). It will return True
as soon as this "pattern" is matched.
You can do something very similar for the second type of sentence:
def am_able_to(tagged):
return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
Here's a driver for the program:
s1 = [('This', 'DT'), ('feature', 'NN'), ('would', 'MD'), ('be', 'VB'), ('nice', 'JJ'), ('to', 'TO'), ('have', 'VB')]
s2 = [('I', 'PRP'), ('am', 'VBP'), ('able', 'JJ'), ('to', 'TO'), ('delete', 'VB'), ('the', 'DT'), ('group', 'NN'), ('functionality', 'NN')]
def would_be(tagged):
return any(['would', 'be', 'JJ'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][1]] for i in xrange(len(tagged) - 2))
def am_able_to(tagged):
return any(['am', 'able', 'to', 'VB'] == [tagged[i][0], tagged[i+1][0], tagged[i+2][0], tagged[i+3][1]] for i in xrange(len(tagged) - 3))
sent1 = ' '.join(s[0] for s in s1)
sent2 = ' '.join(s[0] for s in s2)
print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s1), sent1))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s1), sent1))
print("Is '{1}' of type 'would be' + adj? {0}".format(would_be(s2), sent2))
print("Is '{1}' of type 'am able to' + verb? {0}".format(am_able_to(s2), sent2))
This correctly outputs:
Is 'This feature would be nice to have' of type 'would be' + adj? True
Is 'This feature would be nice to have' of type 'am able to' + verb? False
Is 'I am able to delete the group functionality' of type 'would be' + adj? False
Is 'I am able to delete the group functionality' of type 'am able to' + verb? True
If you'd like to generalize this, you can change whether you're checking the literal words or their POS tag.
Upvotes: 2