Reputation: 151
Given an input sentence that is pos tagged using pos_tag function in nltk :
[('Veer', 'NNP'), ('Singh', 'NNP'), ('Rathore', 'NNP'), ('auctioned', 'VBD'), ('his', 'PRP$'), ('gigantic', 'JJ'), ('house', 'NN'), ('in', 'IN'), ('New', 'NNP'), ('York', 'NNP'), ('.', '.')]
I need to extract the phrases which follow a certain pattern. For example, 'NNP NNP' or 'JJ NN'. There can be 'n' no. of patterns that we might want to extract. For example, here we need 2 patterns namely 'NNP NNP' and 'JJ NN'.
The output that I want for the above inputted sentence is a list of the phrases like :
output :
['Veer Singh Rathore', 'gigantic house', 'New York']
I have tried something like this :
> grammar = (''' Chunk:{<JJ><NN>|<NNP>+<NNP>} ''')
>
> def pos_and_chunking(question):
> words = word_tokenize(question)
> pos_words = pos_tag(words)
> chunkParser = RegexpParser(grammar)
> chunked_phrases = chunkParser.parse(pos_words)
> chunked_phrases.draw()
> for subtree in chunked_phrases.subtrees():
> print(subtree)
But the output I am getting is in the form of a tree.
Output :
(S (Chunk Veer/NNP Singh/NNP Rathore/NNP) auctioned/VBD his/PRP$ (Chunk gigantic/JJ house/NN) in/IN (Chunk New/NNP York/NNP) ./.) (Chunk Veer/NNP Singh/NNP Rathore/NNP) (Chunk gigantic/JJ house/NN) (Chunk New/NNP York/NNP)
How can this be resolved?
I referred this link for Chunking : https://www.codespeedy.com/chunking-rules-in-nlp/
Upvotes: 2
Views: 460
Reputation: 2270
If you expect a simple list of tokens with tags for your grammar, then you can simply flatten the tree: ignore the structure, and just filter out the tokens with tags, and then use the resulting list.
Upvotes: 0