Split python string if newline starts with digit

Question

I try to split a text if a newline starts with a digit.

a="""1.pharagraph1 
    text1
    2.pharagraph2
    text2
    3.pharagraph3
    text3
    """

The expected result would be:

['1.pharagraph1 text1' , '2.pharagraph2 text2', '3.pharagraph3 text3']

I tried: re.split(' \d{1}',a) and it doesn't work for this task.

Wiktor Stribiżew · Accepted Answer

If you really have leading spaces and you did not make a typo when creating a sample string, you can use

[re.sub(r'[^\S
]*
[^\S
]*', ' ', x).strip() for x in re.split(r'
[^\S
]*(?=\d)', a)]
# => ['1.pharagraph1 text1', '2.pharagraph2 text2', '3.pharagraph3 text3']

See the Python demo.

The [^\S ]*(?=\d) pattern matches a newline and then any zero or more horizontal whitespaces ([^\S ]*) followed with a digit. Then, inside each match, every sequence of 0+ horizontal whitespaces, newline and 0+ horizontal whitespaces is replaced with a space.

If the string has no leading whitespace, you can use a simpler approach:

import re
a="""1.pharagraph1 
text1
2.pharagraph2
text2
3.pharagraph3
text3"""
print( [x.replace("
"," ") for x in re.split(r'
(?=\d)', a)] )
# => ['1.pharagraph1  text1', '2.pharagraph2 text2', '3.pharagraph3 text3']

See the online Python demo. Here, the string is simply split at a newline that is followed with a digit ((?=\d)) and then all newlines are replaced with a space.

Split python string if newline starts with digit

Answers (2)

Related Questions