Reputation: 35
I am trying to create list of tuples with the data after strings string1
and string3
. But not getting expected result.
s = 'string1:1234string2string3:a1b2c3string1:2345string3:b5c6d7'
re.findall('string1:(\d+)[\s,\S]+string3:([\s\S]+',s)
Actual result:
[('1234', 'b5c6d7)']
Expected result:
[('1234', 'a1b2c3'), ('2345', 'b5c6d7')]
Upvotes: 2
Views: 334
Reputation: 196
The problem is that [\s,\S]+
is greedy and therefore consuming everything between the first string1 and the last string3.
You can fix that by using positive lookaheads and making the regex non greedy like this:
string1:(\d+)[^\d][\s,\S]+?string3:([\s\S]+?(?=string|$))
Upvotes: 0
Reputation: 163517
You current regex uses [\s,\S]+
which is greedy and matches all characters until the end of the line.
You could make it non greedy and use a positive lookahead (?=string|$)
for the last match that assert what follows is either string
or the end of the line $
.
string1:(\d+).*?string3:(.*?)(?=string|$)
import re
s = 'string1:1234string2string3:a1b2c3string1:2345string3:b5c6d7'
print(re.findall('string1:(\d+).*?string3:(.*?)(?=string|$)',s))
Upvotes: 3