Reputation: 1
data 1
Hi there: first
Hello: second
welcome: third
data 2
Hi there: first
welcome: third
My intention is to write a regex code to get the above bold text. In data2 Hello: is missing. how to handle it in a single regex?
My code is:
import re
mat = re.search(r"Hi there:\n(.*)\n(Hello:\n(.*))?\nwelcome:\n(.*)", data1, re.DOTALL)
print(mat)
print(mat.group(1))
print(mat.group(2))
print(mat.group(3))
output I'm getting:
<_sre.SRE_Match object at 0x10694aca8>
first ->
Hello: second None None
Upvotes: 0
Views: 54
Reputation: 163577
You could use 3 groups and make the second group optional. You can omit the re.DOTALL
and instead match 0 or more whitespace chars \s*
after matching the newline.
(Hi there:)\r?\n\s*(?:(Hello:)\r?\n\s*)?(welcome:)
In the code you could for example check if group 2 is not None
import re
regex = r"(Hi there:)\r?\n\s*(?:(Hello:)\r?\n\s*)?(welcome:)"
data1 = ("Hi there:\n\n"
"Hello:\n\n"
"welcome:")
mat = re.search(regex, data1)
if mat:
print(mat.group(1))
if mat.group(2) is not None:
print(mat.group(2))
print(mat.group(3))
Upvotes: 0