Ilija
Ilija

Reputation: 190

Is it possible to solve this problem with Regular Expression?

I'm trying to divide long text in small parts, so that every part is at least N characters and ended with some of the stop punctuation marks (? . !). If the part is bigger than N characters we sttoped when the next punctuation mark appear.

For example :

Lets say N = 10

Do you want lime? Yes. I love when I drink tequila. 

This sentence should be divided in two parts.

[1] Do you want lime?
[2] Yes. I love when I drink tequila.

Upvotes: 2

Views: 356

Answers (2)

Thomas
Thomas

Reputation: 2162

Maybe like this? (Thanks to KennyTM for final optimizations.)

.{10}[^.?!]*[.?!]+

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336148

.{10,}?[.!?]+\s*

should work. It will also keep repeated punctuation characters together, so it splits Do you want lime??? Yes. I love when I drink tequila. into Do you want lime??? and Yes. I love when I drink tequila.

However, it doesn't take quoted speech into account and will break Peter said "Hi! How about dinner tonight?" and left. into Peter said "Hi!, How about dinner tonight? and " and left.

Could that be a problem that needs to be taken into account?

Upvotes: 2

Related Questions