Reputation: 520
On the NLTK docs, this is how printing a tree (in this case, 'entities') is shown to be:
import nltk
sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good."""
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
entities = nltk.chunk.ne_chunk(tagged)
entities
Tree('S', [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'),
('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN'),
Tree('PERSON', [('Arthur', 'NNP')]),
('did', 'VBD'), ("n't", 'RB'), ('feel', 'VB'),
('very', 'RB'), ('good', 'JJ'), ('.', '.')])
But when I try to do the exact same thing with the exact same code, this is what happens:
entities
(S
At/IN
eight/CD
o'clock/NN
on/IN
Thursday/NNP
morning/NN
(PERSON Arthur/NNP)
did/VBD
n't/RB
feel/VB
very/RB
good/JJ
./.)
In case you haven't caught on, I would like the output of my code (which is the exact same code) to be formatted like the output of the code from the docs.
I have tried this on both python 2.7 and python 3.5, with the same results. Is there a fix? Perhaps I'm just missing some nltk add-on? If there is a fix, I'd prefer python 2.7.
Upvotes: 0
Views: 146
Reputation: 50190
Are you sure you just typed entities
to get the result you report? What you see in the nltk homepage is the unambiguous representation of the tree object (its “repr” form, in python terms). You get when you dump a variable by just typing its name at the prompt. If you print out the tree with print(entities)
(which is probably what you actually did), it provides the customized, more readable form without the Tree
types and tuple notation.
So there is no problem, and no fix is needed. These are two representations of the same object. If you have to use print
to see the variable's content (e.g., you are not at the interactive prompt) but you want to match the output you see in the example, you can use print(repr(entities))
.
Upvotes: 2