Reputation: 15683
I get the tokens and noun phrases with
text = ("This is commonly referred to as global warming or climate change.")
doc = nlp(text)
for token in doc:
print(token.i, token.text)
print("Noun phrases:", [chunk.text for chunk in doc.noun_chunks])
and the result is
0 This
1 is
2 commonly
3 referred
4 to
5 as
6 global
7 warming
8 or
9 climate
10 change
11 .
Noun phrases: ['global warming', 'climate change']
is it possible to get the index of tokens for noun phrases instead of the words? For example
Noun phrases: ['6,7', '9,10']
Upvotes: 2
Views: 147
Reputation: 627545
You may use the Span
's start
and end
properties:
start int The index of the first token of the span.
end int The index of the first token after the span.
So, use
print("Noun phrases:", [(chunk.start,chunk.end-1) for chunk in doc.noun_chunks])
# => Noun phrases: [(6, 7), (9, 10)]
Or, if you need comma-separated string items,
["{},{}".format(chunk.start,chunk.end-1) for chunk in doc.noun_chunks]
## => Noun phrases: ['6,7', '9,10']
Upvotes: 2