Patrick Burns
Patrick Burns

Reputation: 1833

re.split and delimiter to result

How i can include delimetr to re.split result?

For example, i have text

Bla bla lbaa dsad asd as. Asd qe as!  ASDadf asd! Dsss dwq. Dkmef? 

RegExr

re.split('\s*([\.!\?]+)\s*', data)

And re.split return this

['Bla bla lbaa dsad asd as', '.', 'Asd qe as', '!', 'ASDadf asd', '!', 'Dsss dwq', '.', 'Dkmef', '?', '']

While i want this

['Bla bla lbaa dsad asd as.', 'Asd qe as!', 'ASDadf asd!', 'Dsss dwq.']

How i can do it without spikes?

Thanks

Upvotes: 3

Views: 135

Answers (1)

Pavel Anossov
Pavel Anossov

Reputation: 62908

You can try splitting by whitespace preceded by punctuaction:

In [9]: re.split(r'(?<=[\.!\?])\s+', data)
Out[9]:
['Bla bla lbaa dsad asd as.',
 'Asd qe as!',
 ' ASDadf asd!',
 'Dsss dwq.',
 'Dkmef?']

Explanation from the documentation for the re module:

(?<=...)

Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in abcdef, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not.

Upvotes: 4

Related Questions