Reputation: 15136

How to match regex in python?

describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do

I try to filter the sg-ezsrzerzer out of it (so I want to filter on start sg- till double quote). I'm using python

I currently have:

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-.*\b', a)
print(test)

output is

['sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do']

How do I only get ['sg-ezsrzerzer']?

Upvotes: 0

Answers (3)

Ryszard Czech

Reputation: 18641

Match until the first word boundary with \w+:

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-\w+', a)
print(test[0])

See Python proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  sg-                      'sg-'
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))

Results: g-ezsrzerzer

Upvotes: 0

The fourth bird

Reputation: 163632

The pattern \bsg-.*\b matches too much as the .* will match until the end of the string, and will then backtrack to the first word boundary, which is after the o and the end of string.

If you are using re.findall you can also use a capture group instead of lookarounds and the group value will be in the result.

:group_id=>"(sg-[^"\r\n]+)"

The pattern matches:

:group_id=>" Match literally
(sg-[^"\r\n]+) Capture group 1 match sg- and 1+ times any char except " or a newline
" Match the double quote

See a regex demo or a Python demo

For example

import re

pattern = r':group_id=>"(sg-[^"\r\n]+)"'
s = "describe aws_security_group({:group_id=>\"sg-ezsrzerzer\", :vpc_id=>\"vpc-zfds54zef4s\"}) do"

print(re.findall(pattern, s))

Output

['sg-ezsrzerzer']

Upvotes: 0

JPI93

Reputation: 1557

The pattern (?<=group_id=\>").+?(?=\") would work nicely if the goal is to extract the group_id value within a given string formatted as in your example.

(?<=group_id=\>") Looks behind for the sub-string group_id=>" before the string to be matched.

.+? Matches one or more of any character lazily.

(?=\") Looks ahead for the character " following the match (effectively making the expression .+ match any character except a closing ").

If you only want to extract sub-strings where the group_id starts with sg- then you can simply add this to the matching part of the pattern as follows (?<=group_id=\>")sg\-.+?(?=\")

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

results = re.findall('(?<=group_id=\>").+?(?=\")', s)

print(results)

Output

['sg-ezsrzerzer']

Of course you could alternatively use re.search instead of re.findall to find the first instance of a sub-string matching the above pattern in a given string - depends on your use case I suppose.

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

result = re.search('(?<=group_id=\>").+?(?=\")', s)

if result:
    result = result.group()

print(result)

Output

'sg-ezsrzerzer'

If you decide to use re.search you will find that it returns None if there is no match found in your input string and an re.Match object if there is - hence the if statement and call to s.group() to extract the matching string if present in the above example.

Upvotes: 1

How to match regex in python?

Answers (3)

Related Questions