Shakesbeery
Shakesbeery

Reputation: 249

Using custom POS tags for NLTK chunking?

Is it possible to use non-standard part of speech tags when making a grammar for chunking in the NLTK? For example, I have the following sentence to parse:

complication/patf associated/qlco with/prep breast/noun surgery/diap
independent/adj of/prep the/det use/inpr of/prep surgical/diap device/medd ./pd

Locating the phrases I need from the text is greatly assisted by specialized tags such as "medd" or "diap". I thought that because you can use RegEx for parsing, it would be independent of anything else, but when I try to run the following code, I get an error:

grammar = r'TEST: {<diap>}'
cp = nltk.RegexpParser(grammar)
cp.parse(sentence)

ValueError: Transformation generated invalid chunkstring:
<patf><qlco><prep><noun>{<diap>}<adj><prep><det><inpr><prep>{<diap>}<medd><pd>

I think this has to do with the tags themselves, because the NLTK can't generate a tree from them, but is it possible to skip that part and just get the chunked items returned? Maybe the NLTK isn't the best tool, and if so, can anyone recommend another module for chunking text?

I'm developing in python 2.7.6 with the Anaconda distribution.

Thanks in advance!

Upvotes: 1

Views: 2427

Answers (2)

avi
avi

Reputation: 21

#POS Tagging
words=word_tokenize(example_sent)
pos=nltk.pos_tag(words)
print(pos)

#Chunking
chunk=r'Chunk: {<JJ.?>+<NN.?>+}'
par=nltk.RegexpParser(chunk)
par2=par.parse(pos)
print('Chunking - ',par2)
print('------------------------------ Parsing the filtered chunks')
# printing only the required chunks
for i  in par2.subtrees():
    if i.label()=='Chunk':
        print(i)
print('------------------------------NER')        
# NER
ner=nltk.ne_chunk(pos)
print(ner)

Upvotes: 0

kavini
kavini

Reputation: 143

Yes it is possible to use custom tags for NLTK chunking. I have used the same. Refer: How to parse custom tags using nltk.Regexp.parser()

The ValueError and the error description suggest that there is an error in the formation of your grammar and you need to check that. You can update the answer with the same for suggestions on corrections.

Upvotes: 1

Related Questions