Reputation: 190
I'm trying to divide long text in small parts, so that every part is at least N characters and ended with some of the stop punctuation marks (? . !). If the part is bigger than N characters we sttoped when the next punctuation mark appear.
For example :
Lets say N = 10
Do you want lime? Yes. I love when I drink tequila.
This sentence should be divided in two parts.
[1] Do you want lime?
[2] Yes. I love when I drink tequila.
Upvotes: 2
Views: 356
Reputation: 2162
Maybe like this? (Thanks to KennyTM for final optimizations.)
.{10}[^.?!]*[.?!]+
Upvotes: 2
Reputation: 336148
.{10,}?[.!?]+\s*
should work. It will also keep repeated punctuation characters together, so it splits Do you want lime??? Yes. I love when I drink tequila.
into Do you want lime???
and Yes. I love when I drink tequila.
However, it doesn't take quoted speech into account and will break Peter said "Hi! How about dinner tonight?" and left.
into Peter said "Hi!
, How about dinner tonight?
and " and left.
Could that be a problem that needs to be taken into account?
Upvotes: 2