Reputation: 1396
I'm reading in text from a PDF and am looking to split a string based on (anumber)
and keep that value in the split string. So the string:
Some sentence. (1) Another Sentence. (2) Final Sentence.
Would turn into
Some sentence.
(1) Another Sentence.
(2) Final Sentence.
I've tried to do this with thestring.split('(')
as a workaround, but there are parentheses found in some of the sentences leading to issues. Thanks!
Upvotes: 0
Views: 389
Reputation: 118
import re
m = re.search('\([0-9]\).*\.', str)
# regex : escape the parens, capture a ONE DIGIT number from 0-9,
# escape paren, any sequence of characters, end with an escaped dot
# process the match object however you want
For all regex forming, I would use Rubular
Upvotes: 1
Reputation: 521269
I would split on the regex pattern \s+(?=\(\d+\))
:
inp = "Some sentence. (1) Another Sentence. (2) Final Sentence."
parts = re.split(r'\s+(?=\(\d+\))', inp)
print(parts)
This prints:
['Some sentence.', '(1) Another Sentence.', '(2) Final Sentence.']
The regex pattern used here says to split on one or more whitespace characters which are followed by something like (1)
, that is, a number contained within parentheses.
Upvotes: 2
Reputation: 54148
You can use (?<=\.)\s
which means "space preceded by a dot"
value = "Some sentence. (1) Another Sentence. (2) Final Sentence."
res = re.split(r"(?<=\.)\s", value)
print(res) # ['Some sentence.', '(1) Another Sentence.', '(2) Final Sentence.']
Upvotes: 2