Reputation: 137
I'd like to split a string by increased number with python.
For example, I have a following string.
"1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"
And I want to get a following list from the above string.
[aaa aaa aa, bb bbbb bb, cc cccc cc, ddd d dddd, ... z zzzz zzz]
I tried it with following code, but I couldn't get what I wanted.
InputString = "1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"
OutputList = InputString.split("[1-99]. ")
Upvotes: 1
Views: 102
Reputation: 27723
This expression might also work:
import re
regex = r"(?<=[0-9]\.)\s*(.*?)(?=[0-9]{1,}\.|$)"
test_str = "1. aaa aaa aa. 2. bb bbbb bb. 3. cc cccc cc 4. ddd d dddd ... 99. z zzzz zzz"
print(re.findall(regex, test_str))
['aaa aaa aa. ', 'bb bbbb bb. ', 'cc cccc cc ', 'ddd d dddd ... ', 'z zzzz zzz']
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Upvotes: 0
Reputation: 32244
You can use the re
module to split your string by a regular expression
re.split(r'[0-9]+\.', input)
[0-9]+
matches 1 to many digits and \.
matches the literal .
character
EDIT:
You can prefix the regex with (\.\s)?
to conditionally find leading periods at the end of each list of characters
re.split(r'(\.\s)?[0-9]+\.', input)
Upvotes: 4