Reputation: 1833
How i can include delimetr to re.split
result?
For example, i have text
Bla bla lbaa dsad asd as. Asd qe as! ASDadf asd! Dsss dwq. Dkmef?
RegExr
re.split('\s*([\.!\?]+)\s*', data)
And re.split
return this
['Bla bla lbaa dsad asd as', '.', 'Asd qe as', '!', 'ASDadf asd', '!', 'Dsss dwq', '.', 'Dkmef', '?', '']
While i want this
['Bla bla lbaa dsad asd as.', 'Asd qe as!', 'ASDadf asd!', 'Dsss dwq.']
How i can do it without spikes?
Thanks
Upvotes: 3
Views: 135
Reputation: 62908
You can try splitting by whitespace preceded by punctuaction:
In [9]: re.split(r'(?<=[\.!\?])\s+', data)
Out[9]:
['Bla bla lbaa dsad asd as.',
'Asd qe as!',
' ASDadf asd!',
'Dsss dwq.',
'Dkmef?']
Explanation from the documentation for the re module:
(?<=...)
Matches if the current position in the string is preceded by a match for
...
that ends at the current position. This is called a positive lookbehind assertion.(?<=abc)def
will find a match inabcdef
, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning thatabc
ora|b
are allowed, buta*
anda{3,4}
are not.
Upvotes: 4