NP-chunker value error (Python nltk)

Question

I am building an NLP-pipeline based on the Python NLTK book (chapter 7). The first segment of codes correctly preprocesses the data, but I am unable to run its output through my NP-chunker:

import nltk, re, pprint

#Import Data

data = 'This is a test sentence to check if preprocessing works' 

#Preprocessing

def preprocess(document):
    sentences = nltk.sent_tokenize(document)
    sentences = [nltk.word_tokenize(sent) for sent in sentences] 
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    return(sentences)

tagged = preprocess(data)
print(tagged)

#regular expression-based NP chunker

grammar = "NP: {?*}"
cp = nltk.RegexpParser(grammar) #chunk parser
chunked = []
for s in tagged:
    chunked.append(cp.parse(tagged))
print(chunked)

This is the traceback I get:

Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "C:/Users/u0084411/Box Sync/Procesmanager DH/Text Mining/Tools/NLP_pipeline.py", line 24, in 
    chunked.append(cp.parse(tagged))
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages
ltk\chunk
egexp.py", line 1202, in parse
    chunk_struct = parser.parse(chunk_struct, trace=trace)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages
ltk\chunk
egexp.py", line 1017, in parse
    chunkstr = ChunkString(chunk_struct)
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages
ltk\chunk
egexp.py", line 95, in __init__
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages
ltk\chunk
egexp.py", line 95, in 
    tags = [self._tag(tok) for tok in self._pieces]
  File "C:\Users\u0084411\AppData\Local\Continuum\Anaconda3\lib\site-packages
ltk\chunk
egexp.py", line 105, in _tag
    raise ValueError('chunk structures must contain tagged '
ValueError: chunk structures must contain tagged tokens or trees
>>>

What is my mistake here? 'Tagged' is tokenized, so why does the program not recognize this?

Many thanks! Tom

NP-chunker value error (Python nltk)

Answers (1)

Related Questions