How to remove characters found from this regex?

Question

str = "0"
print re.search("<.*?>", str).group()
print re.search(">.*?<", str).group()
>> 
>> >0<

How can I get it so that the resulting text is "test" and "0" and not include the two characters I used as markers in the regex?

Andrew Clark · Accepted Answer

You shouldn't be using regex to parse XML/HTML, see murgatroid99's comment.

That being said, here is how you can get the results you want for this example using regex. Use a capturing group:

>>> s = "0"
>>> print re.search(r"<(.*?)>", s).group(1)
test
>>> print re.search(r">(.*?)<", s).group(1)
0

Note that you shouldn't use str as a variable name, as it will mask the built-in type.

An alternative to a capturing group would be a lookbehind and lookahead:

>>> print re.search(r"(?<=<).*?(?=>)", s).group()
test
>>> print re.search(r"(?<=>).*?(?=<)", s).group()
0

Using raw string literals (r"...") isn't necessary for these in particular, but it is good to get into the habit of using them when writing regular expressions to make sure that backslashes are handled properly.

How to remove characters found from this regex?

Answers (2)

Related Questions