How to optimize this regex

Question

Can someone help me to optimize my regex pattern, so I don't have to go through each regexes below. So it matches all of the string like the example I provided.

$pattern = "/__\("(.*)"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\("(.*)",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\("(.*)"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\("(.*)",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

Example:

_e('string');
_e("string");
_e('string', 'string2');
_e("string", 'string2');
__('string');
__("string");
__('string', 'string2');
__("string", 'string2');

Also if it possible, to match also these string below.

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

If it is possible to get the value string2 too. In the worst case, in the file, there are also mixed single and double quote.

Like you see on my preg_match_all code now, I go with 8 patterns for the first and also 8 patterns for the second one to get the first string.

Note: I just only run this script on console command, not in PHP application. So I don't pay any attention to the performance and it doesn't matter too.

Thank you for your help!

Edited

Thank you for the response. I tried both your regex, almost there. My question might confusing. I am not english speaker. I copy paste from regex101. It might be easier to understand, what I am trying to achieve.

https://regex101.com/r/uX5nqR/2

and this one too

https://regex101.com/r/Fxs7yY/1

Please check this out. I tried to extract translations from wordpress project and also twig file which using "trans" filter. I know there are mo po Editor, but the editor don't recognize the file extension I used.

Fiona Runge · Accepted Answer

I took the liberty of writing this in JavaScript, but the regex will work the same.

My complete code looks like this:

const r = /^_[e_]$("(.*)"|\'(.*)\')(, ("(.*)"|\'(.*)\'))?$;$/;

const xs = [
  "_e('string');",
  "_e("string");",
  "_e('string', 'string2');",
  "_e("string", 'string2');",
  "__('string');",
  "__("string");",
  "__('string', 'string2');",
  "__("string", 'string2');",
];

xs.forEach((x) => {
  const matches = x.match(r);

  if(matches){
    console.log('matches are:\n ', matches.filter(m => m !== undefined).join('\n  '));
  }else{
    console.log('no matches for', x);
  }
});

Now let me explain how the regex works and how I arrived at it: First I noticed that all your strings start with _ and end with );, so I knew the regex had to look something like ^…\);$. Here ^ and $ mark the beginning and end of the string, and you should leave them out if they're not required.

After the initial _ you've got either another _ or a e, so we put these into a group followed by the opening parenthesis: [e_]$.

Now we have a string that is either in " or in ', and we put it down as alternatives: ("(.*)"|\'(.*)\').

This string is repeated, but optionally, with a leading , in front. So we get (, …)? for the optional part, and ("(.*)"|\'(.*)\') for the whole second portion.

For the second portion of your problem you can use the same strategy:

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

Start building up your regex from the similarities. We've got the same string pattern as before used twice, and the optional second part now looks like (\(\{\}, ("(.*)"|\'(.*)\')$)?.

This way we can end up with a regex like this:

^("(.*)"|\'(.*)\')\|trans$\{\}, ("(.*)"|\'(.*)\')$)?$

Please note that this regex is not tested, but just a guess from my side.

Upon further discussion it became apparent that we're looking at several matches in a larger bunch of text. To adapt to this we need to exclude the ' and " characters from the innermost groups, which leaves us with these regexes:

_[e_]$("([^"]*)"|\'([^']*)\')(, ("([^"]*)"|\'([^']*)\'))?$;
("(.*)"|\'(.*)\')\|trans($\{\}, ("(.*)"|\'(.*)\')$)?

I've also noted that my second regex apparently had an unmatched parenthesis in it.

How to optimize this regex

Answers (2)

Related Questions