Reputation: 563
I am trying to extract a substring between two set of patterns using re.search()
.
On the left, there can be either 0x
or 0X
, and on the right there can be either U
,
, or \n
. The result should not contain boundary patterns. For example, 0x1234U
should result in 1234
.
I tried with the following search pattern: (0x|0X)(.*)(U| |\n)
, but it includes the left and right patterns in the result.
What would be the correct search pattern?
Upvotes: 0
Views: 1663
Reputation: 163207
You could use also use a single group using .group(1)
0[xX](.*?)[U\s]
The pattern matches:
0[xX]
Match either 0x
or 0X
(.*?)
Capture in group 1 matching any character except a newline, as least as possible[U\s]
Match either U
or a whitespace characters (which could also match a newline)import re
s = r"0x1234U"
pattern = r"0[xX](.*?)[U\s]"
m = re.search(pattern, s)
if m:
print(m.group(1))
Output
1234
Upvotes: 1
Reputation: 42143
You could use a combination of lookbehind and lookahead with a non-greedy match pattern in between:
import re
pattern = r"(?<=0[xX])(.*?)(?=[U\s\n])"
re.findall(pattern,"---0x1234U...0X456a ")
['1234', '456a']
Upvotes: 1