Jo_Saga
Jo_Saga

Reputation: 35

Split python string if newline starts with digit

I try to split a text if a newline starts with a digit.

a="""1.pharagraph1 
    text1
    2.pharagraph2
    text2
    3.pharagraph3
    text3
    """

The expected result would be:

['1.pharagraph1 text1' , '2.pharagraph2 text2', '3.pharagraph3 text3']

I tried: re.split('\n\d{1}',a) and it doesn't work for this task.

Upvotes: 1

Views: 585

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627607

If you really have leading spaces and you did not make a typo when creating a sample string, you can use

[re.sub(r'[^\S\n]*\n[^\S\n]*', ' ', x).strip() for x in re.split(r'\n[^\S\n]*(?=\d)', a)]
# => ['1.pharagraph1 text1', '2.pharagraph2 text2', '3.pharagraph3 text3']

See the Python demo.

The \n[^\S\n]*(?=\d) pattern matches a newline and then any zero or more horizontal whitespaces ([^\S\n]*) followed with a digit. Then, inside each match, every sequence of 0+ horizontal whitespaces, newline and 0+ horizontal whitespaces is replaced with a space.

If the string has no leading whitespace, you can use a simpler approach:

import re
a="""1.pharagraph1 
text1
2.pharagraph2
text2
3.pharagraph3
text3"""
print( [x.replace("\n"," ") for x in re.split(r'\n(?=\d)', a)] )
# => ['1.pharagraph1  text1', '2.pharagraph2 text2', '3.pharagraph3 text3']

See the online Python demo. Here, the string is simply split at a newline that is followed with a digit (\n(?=\d)) and then all newlines are replaced with a space.

Upvotes: 1

Ajax1234
Ajax1234

Reputation: 71471

You can use a lookahead to only split when the newline and spaces are followed by a digit:

import re
result = re.split('\n\s+(?=\d)', a)

Upvotes: 1

Related Questions