Philip Kessler
Philip Kessler

Reputation: 11

regex end search on specific character

So I have a string that I am trying to search through using re.search. The string is combined of an Id and then a string after it. So it looks like this: <@randomId> string after

I am using the regex pattern "^@(|[WU].+?)>(.*)" in my re.search method to try and get two different groups. The first group is the id, minus the < >. So it would just be "@randomId". And the second group would be the "string after" text that comes after the Id. So if the text I am passing into re.search is "<@QWE1234> do this" I want to match and return "@QWE1234" and "do this".

With the regex I am using I am getting a return type of None, and when I add in < to the regex pattern, so it looks like this: "^<@(|[WU].+?)>(.*)" I get the whole string.

Upvotes: 1

Views: 1092

Answers (3)

The fourth bird
The fourth bird

Reputation: 163247

To match 2 capturing groups, you could remove this part |[WU] from your regex and add \s+ to account for the following whitespace characters so that you don't have to trim that match.

Your regex could look like^<(@.+?)>\s+(.*)

Or instead of using .+?, you could use [^>]+

<(@[^>]+)>\s+(.*)

That would match

  • Match <
  • (@[^>]+) Capture in group 1 and @, then not > using a negated character class
  • Match >
  • \s+ Match on or more whitespace characters
  • (.*) Capture zero or more characters in group 2 (If there has to be at least 1 character following you could use .+ instead)

Demo

If you only want to allow uppercase characters and numbers, you could use:

<(@[0-9A-Z@]+)>\s+(.*)

Upvotes: 1

Philip Kessler
Philip Kessler

Reputation: 11

So the regex "^<@(|[WU].+?)>(.*)" was the correct one, but I was not returning the correct search group from the re.search method. I had to specify return (matches.group(1), matches.group(2).strip())

Upvotes: 0

Jasmijn
Jasmijn

Reputation: 10452

^ matches the start of the string, so you would want your pattern to be either "^<@(|[WU].+?)>(.*)" or "@(|[WU].+?)>(.*)". Note that the pattern is a bit more complicated than it needs to be, "^<@(.+?)>(.*)" should work. You might also want to pull the @ inside the group, because that way it would match the output you described.

So then your code would be something like:

match = re.search(r"^<(@.+?)>(.*)")
if match is None: 
   pass # handle the case that it is not found
else:
   randomId = match.groups(1)
   textAfter = match.groups(2)

Upvotes: 0

Related Questions