Martin
Martin

Reputation: 726

Regex to find string arguments of a function call (lookbehind with multiple hits)

I want to use grep (PCRE) to find all single-quoted strings that are passed to my function foo().

Example functions calls in my source code and the expected hits:

foo('Alice')                    -> Expected Hits: Alice
foo('Alice', 'Bob', 'Charlie')  -> Expected Hits: Alice, Bob, Charlie
foo(flag ? 'Alice' : 'Bob')     -> Expected Hits: Alice, Bob

My regex:

foo\([^\)]*\K(?:')([^'\)]*)(?:'\))

However, I get only the last single-quoted string for each function call and not all as you can see in my regex101 playground: https://regex101.com/r/FlzDYp/1

How can I define a PCRE conform regex for grep to get all expected hits?

Upvotes: 1

Views: 111

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522581

In JavaScript, we can match all function calls, then use match a second time to find all string parameters:

var input = `foo('Alice')
foo('Alice', 'Bob', 'Charlie')
foo(flag ? 'Alice' : 'Bob')`;

var calls = Array.from(input.matchAll(/foo\((.*?)\)/g));
var arguments = calls.map(call => Array.from(call[1].matchAll(/'(.*?)'/g)));
for (var i=0; i < calls.length; ++i) {
    console.log(calls[i][0] + ": " + arguments[i].map(argument => argument[1]));
}

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163577

You might use grep with -P for PCRE and -o to print only the matched parts.

The pattern in parts matches:

  • (?: Non capture group
    • \bfoo\( Match the word foo followed by (
    • (?=[^()]*\)) Positive lookahead to assert a closing ) to the right
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, but not at the start of the string (as \G can match at those 2 positions)
  • ) Close the non capture group
  • [^']* match optional chars other than '
  • (?:'\h*[,:]\h*)? Optionally match either , or : between optional spaces
  • ' Match the '
  • \K Forget what is matched so far as we don't want that ' in the result
  • \w+ Match 1 or more word characters

Example:

grep -oP "(?:\bfoo\((?=[^()]*\))|\G(?!^))[^']*(?:'\h*[,:]\h*)?'\K\w+" file

See a regex demo for the matches.


An alternative using awk first matching the format foo(....) and then printing all the values between the single quotes that are alphanumeric or an underscore using a while loop.

The \047 is a single quote here.

awk '/foo\([^][]*\)/ {
  while(match($0, /\047[[:alnum:]_]+\047/)) {
    print substr($0, RSTART+1, RLENGTH-2)
    $0 = substr($0, RSTART + RLENGTH)
  }
}' file

Upvotes: 2

Related Questions