Reputation: 15076
describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do
I try to filter the sg-ezsrzerzer
out of it (so I want to filter on start sg-
till double quote). I'm using python
I currently have:
import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-.*\b', a)
print(test)
output is
['sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do']
How do I only get ['sg-ezsrzerzer']
?
Upvotes: 0
Views: 85
Reputation: 18611
Match until the first word boundary with \w+
:
import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-\w+', a)
print(test[0])
See Python proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
sg- 'sg-'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
Results: g-ezsrzerzer
Upvotes: 0
Reputation: 163217
The pattern \bsg-.*\b
matches too much as the .*
will match until the end of the string, and will then backtrack to the first word boundary, which is after the o
and the end of string.
If you are using re.findall you can also use a capture group instead of lookarounds and the group value will be in the result.
:group_id=>"(sg-[^"\r\n]+)"
The pattern matches:
:group_id=>"
Match literally(sg-[^"\r\n]+)
Capture group 1 match sg-
and 1+ times any char except "
or a newline"
Match the double quoteSee a regex demo or a Python demo
For example
import re
pattern = r':group_id=>"(sg-[^"\r\n]+)"'
s = "describe aws_security_group({:group_id=>\"sg-ezsrzerzer\", :vpc_id=>\"vpc-zfds54zef4s\"}) do"
print(re.findall(pattern, s))
Output
['sg-ezsrzerzer']
Upvotes: 0
Reputation: 1557
The pattern (?<=group_id=\>").+?(?=\")
would work nicely if the goal is to extract the group_id
value within a given string formatted as in your example.
(?<=group_id=\>")
Looks behind for the sub-string group_id=>"
before the string to be matched.
.+?
Matches one or more of any character lazily.
(?=\")
Looks ahead for the character "
following the match (effectively making the expression .+
match any character except a closing "
).
If you only want to extract sub-strings where the group_id
starts with sg-
then you can simply add this to the matching part of the pattern as follows (?<=group_id=\>")sg\-.+?(?=\")
import re
s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
results = re.findall('(?<=group_id=\>").+?(?=\")', s)
print(results)
Output
['sg-ezsrzerzer']
Of course you could alternatively use re.search
instead of re.findall
to find the first instance of a sub-string matching the above pattern in a given string - depends on your use case I suppose.
import re
s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
result = re.search('(?<=group_id=\>").+?(?=\")', s)
if result:
result = result.group()
print(result)
Output
'sg-ezsrzerzer'
If you decide to use re.search
you will find that it returns None
if there is no match found in your input string and an re.Match
object if there is - hence the if
statement and call to s.group()
to extract the matching string if present in the above example.
Upvotes: 1