Reputation: 23
I have chunked some basic noun phrases, however, only the basic noun phrases are not sufficient for me. I want to do something more, which is, to split the sentence at the end of each chunked noun phrase.
For example:
sentence = 'protection of system resources against bad behavior'
Chunked noun phrases are (by using doc.noun_chunks in spaCy):
protection, system resources, bad behavior
My desired result:
protection, of system resources, against bad behavior
This means, I need to split the sentence at the end of each chunked phrase, e.g., at the end of "protection", at the end of "system resources".
--Can the split() work in this way?
--Or maybe I can continue to use the rule-based match in spaCy to find .head or immediate left/right words and matched them?
Does anyone have this experience?
Thanks!
Upvotes: 0
Views: 178
Reputation: 42302
--Can the split() work in this way?
No.
--Or maybe I can continue to use the rule-based match in spaCy to find .head or immediate left/right words and matched them?
According to its documentation, noun_chunks
returns an iterator of Span
. Spans have start / end indices, so you could use that information to split the source string e.g.
output = []
prev_end = 0
for span in doc.noun_chunks:
output.append(sentence[prev_end:span.end_char + 1])
prev_end = span.end_char + 1
or something along those lines (you may need to adjust the code as I've never actually used spaCy, I'm just going from what I understand of the docs)
Upvotes: 1