Reputation: 35
I try to split a text if a newline starts with a digit.
a="""1.pharagraph1
text1
2.pharagraph2
text2
3.pharagraph3
text3
"""
The expected result would be:
['1.pharagraph1 text1' , '2.pharagraph2 text2', '3.pharagraph3 text3']
I tried: re.split('\n\d{1}',a)
and it doesn't work for this task.
Upvotes: 1
Views: 585
Reputation: 627607
If you really have leading spaces and you did not make a typo when creating a sample string, you can use
[re.sub(r'[^\S\n]*\n[^\S\n]*', ' ', x).strip() for x in re.split(r'\n[^\S\n]*(?=\d)', a)]
# => ['1.pharagraph1 text1', '2.pharagraph2 text2', '3.pharagraph3 text3']
See the Python demo.
The \n[^\S\n]*(?=\d)
pattern matches a newline and then any zero or more horizontal whitespaces ([^\S\n]*
) followed with a digit. Then, inside each match, every sequence of 0+ horizontal whitespaces, newline and 0+ horizontal whitespaces is replaced with a space.
If the string has no leading whitespace, you can use a simpler approach:
import re
a="""1.pharagraph1
text1
2.pharagraph2
text2
3.pharagraph3
text3"""
print( [x.replace("\n"," ") for x in re.split(r'\n(?=\d)', a)] )
# => ['1.pharagraph1 text1', '2.pharagraph2 text2', '3.pharagraph3 text3']
See the online Python demo. Here, the string is simply split at a newline that is followed with a digit (\n(?=\d)
) and then all newlines are replaced with a space.
Upvotes: 1
Reputation: 71471
You can use a lookahead to only split when the newline and spaces are followed by a digit:
import re
result = re.split('\n\s+(?=\d)', a)
Upvotes: 1