Marko Gulin
Marko Gulin

Reputation: 563

Extract a string between two set of patterns in Python

I am trying to extract a substring between two set of patterns using re.search().

On the left, there can be either 0x or 0X, and on the right there can be either U, , or \n. The result should not contain boundary patterns. For example, 0x1234U should result in 1234.

I tried with the following search pattern: (0x|0X)(.*)(U| |\n), but it includes the left and right patterns in the result.

What would be the correct search pattern?

Upvotes: 0

Views: 1663

Answers (2)

The fourth bird
The fourth bird

Reputation: 163207

You could use also use a single group using .group(1)

0[xX](.*?)[U\s]

The pattern matches:

  • 0[xX] Match either 0x or 0X
  • (.*?) Capture in group 1 matching any character except a newline, as least as possible
  • [U\s] Match either U or a whitespace characters (which could also match a newline)

Regex demo | Python demo

import re

s = r"0x1234U"
pattern = r"0[xX](.*?)[U\s]"

m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

1234

Upvotes: 1

Alain T.
Alain T.

Reputation: 42143

You could use a combination of lookbehind and lookahead with a non-greedy match pattern in between:

import re
   
pattern = r"(?<=0[xX])(.*?)(?=[U\s\n])"

re.findall(pattern,"---0x1234U...0X456a ")

['1234', '456a']

Upvotes: 1

Related Questions