Reputation: 1940
The code is as follows:
#coding=utf-8
import re
str = "The output is\n"
str += "1) python\n"
str += "A dynamic language\n"
str += "easy to learn\n"
str += "2) C++\n"
str += "difficult to learn\n"
str += "3244) PHP\n"
str += "eay to learn\n"
pattern = r'^[1-9]+\) .*'
print re.findall(pattern,str,re.M)
The output is
['1) python', '2) C++', '3244) PHP']
However, I want to split it like this:
['1) python\n'A dynamic language\n easy to learn\n' 2) C++\n difficult to learn\n', '3244) PHP\n easy to learn\n']
That is, ignore the first lines does not start with "number)",and when comes across a number, the following lines until next line start with a "number)" is consider to be the same group. How should I rewrite the pattern ?
Upvotes: 3
Views: 106
Reputation: 89547
you can use this, that allow digits but not followed by a closing parenthesis:
re.findall(r'\d+\)\s(?:\D+|\d+(?!\d*\)))*',str)
Upvotes: 2
Reputation: 250881
>>> import re
>>> strs = 'The output is\n1) python\nA dynamic language\neasy to learn\n2) C++\ndifficult to learn\n3244) PHP\neay to learn\n'
>>> re.findall(r'\d+\)\s[^\d]+',strs)
['1) python\nA dynamic language\neasy to learn\n',
'2) C++\ndifficult to learn\n',
'3244) PHP\neay to learn\n']
Upvotes: 3
Reputation: 4776
You need to add the python regex for whitespace into your pattern to account for the newlines.
Try this:
regex = r"[1-9]+\) .*\s.*"
\s is the regex for any whitespace
Upvotes: 1