Reputation: 11
I am trying to extract the value/argument of each trigger in Jenkinsfiles between the parentheses and the quotes if they exist.
For example, given the following:
upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS) # just parentheses
pollSCM('H * * * *') # single quotes and parentheses
Desired result respectively:
upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *
My current result:
upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *' # Notice the trailing single quote
So far I have been successful with the first trigger (upstream one), but not for the second one (pollSCM) because there's still a trailing single quote.
After the group (.+)
, it doesn't capture the trailing single quote with \'*
, but it does capture the close parenthesis with \)
. I could simply use .replace() or .strip() to remove it, but what is wrong with my regex pattern? How can I improve it? Here's my code:
pattern = r"[A-Za-z]*\(\'*\"*(.+)\'*\"*\)"
text1 = r"upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS)"
text2 = r"pollSCM('H * * * *')"
trigger_value1 = re.search(pattern, text1).group(1)
trigger_value2 = re.search(pattern, text2).group(1)
Upvotes: 1
Views: 877
Reputation: 2993
Your \'*
part of it means 0 or more matches
for your single tick so the .+
will grab the last '
because it's greedy. You need to add the ?
to (.+)
for it to not be greedy. Basically it means to grab everything until it comes across the '
.
This pattern will work for you:
[A-Za-z]*\(\'*\"*(.+?)\'*\"*\)
[UPDATE]
To answer your question below I'll just add it here.
So the ? will make it not greedy up until the next character indicated in the pattern?
Yes, it basically changes repetition operators to not be greedy (lazy quantifier) because they are greedy by default. So .*?a
will match everything until the first a
while .*a
will match everything including any a
found in the string until it can't match against the string anymore. So if your string is aaaaaaaa
and your regex is .*?a
it will actually match every a
. As an example, if you use .*?a
with a substitution of b
for every match on string aaaaaaaa
you will get the string bbbbbbbb
. .*a
however on string aaaaaaaa
with same substitution will return a single b
.
Here's a link that explains the different quantifier types (greedy, lazy, possessive): http://www.rexegg.com/regex-quantifiers.html
Upvotes: 1
Reputation: 82765
import re
s = """upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS) # just parentheses
pollSCM('H * * * *')"""
print(re.findall("\((.*?)\)", s))
Output:
["upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS", "'H * * * *'"]
Upvotes: 2
Reputation: 163277
For you example data your could make the '
optional '?
and capture your values in a group and then loop through the captured groups.
test_str = ("upstream(upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS) # just parentheses\n"
"pollSCM('H * * * *') # single quotes and parentheses")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print (match.group(groupNum))
That would give you:
upstreamProjects: 'upstreamJob', threshold: hudson.model.Result.SUCCESS
H * * * *
To get a more strict match you could use an alternation to match between ()
or ('')
but not with a single '
like ('H * * * *)
and then loop through the captured groups. Because you now capture 2 groups where 1 of the 2 is empty you could check that you only retrieve a non empty group.
Upvotes: 0