How to split a sentence at each specified character/string?

Question

I have chunked some basic noun phrases, however, only the basic noun phrases are not sufficient for me. I want to do something more, which is, to split the sentence at the end of each chunked noun phrase.

For example:

sentence = 'protection of system resources against bad behavior'

Chunked noun phrases are (by using doc.noun_chunks in spaCy):

protection, system resources, bad behavior

My desired result:

protection, of system resources, against bad behavior

This means, I need to split the sentence at the end of each chunked phrase, e.g., at the end of "protection", at the end of "system resources".

--Can the split() work in this way?

--Or maybe I can continue to use the rule-based match in spaCy to find .head or immediate left/right words and matched them?

Does anyone have this experience?

Thanks!

Masklinn · Accepted Answer

--Can the split() work in this way?

No.

--Or maybe I can continue to use the rule-based match in spaCy to find .head or immediate left/right words and matched them?

According to its documentation, noun_chunks returns an iterator of Span. Spans have start / end indices, so you could use that information to split the source string e.g.

output = []
prev_end = 0
for span in doc.noun_chunks:
    output.append(sentence[prev_end:span.end_char + 1])
    prev_end = span.end_char + 1

or something along those lines (you may need to adjust the code as I've never actually used spaCy, I'm just going from what I understand of the docs)

How to split a sentence at each specified character/string?

Answers (1)

Related Questions