Bob
Bob

Reputation: 1396

Python: Split a String by Number

I'm reading in text from a PDF and am looking to split a string based on (anumber) and keep that value in the split string. So the string:

Some sentence. (1) Another Sentence. (2) Final Sentence.

Would turn into

Some sentence.
(1) Another Sentence.
(2) Final Sentence.

I've tried to do this with thestring.split('(') as a workaround, but there are parentheses found in some of the sentences leading to issues. Thanks!

Upvotes: 0

Views: 389

Answers (3)

dang
dang

Reputation: 118

import re
m = re.search('\([0-9]\).*\.', str)
# regex : escape the parens, capture a ONE DIGIT number from 0-9,
# escape paren, any sequence of characters, end with an escaped dot
# process the match object however you want

For all regex forming, I would use Rubular

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521269

I would split on the regex pattern \s+(?=\(\d+\)):

inp = "Some sentence. (1) Another Sentence. (2) Final Sentence."
parts = re.split(r'\s+(?=\(\d+\))', inp)
print(parts)

This prints:

['Some sentence.', '(1) Another Sentence.', '(2) Final Sentence.']

The regex pattern used here says to split on one or more whitespace characters which are followed by something like (1), that is, a number contained within parentheses.

Upvotes: 2

azro
azro

Reputation: 54148

You can use (?<=\.)\s which means "space preceded by a dot"

value = "Some sentence. (1) Another Sentence. (2) Final Sentence."
res = re.split(r"(?<=\.)\s", value)
print(res)  # ['Some sentence.', '(1) Another Sentence.', '(2) Final Sentence.']

Upvotes: 2

Related Questions