user3392555
user3392555

Reputation: 63

How to optimize this regex

Can someone help me to optimize my regex pattern, so I don't have to go through each regexes below. So it matches all of the string like the example I provided.

$pattern = "/__\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/__\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\"(.*)\"/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\"(.*)\",/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\'/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

$pattern = "/_e\(\'(.*)\',/";
preg_match_all($pattern, $content, $matches, PREG_SET_ORDER);

Example:

_e('string');
_e("string");
_e('string', 'string2');
_e("string", 'string2');
__('string');
__("string");
__('string', 'string2');
__("string", 'string2');

Also if it possible, to match also these string below.

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

If it is possible to get the value string2 too. In the worst case, in the file, there are also mixed single and double quote.

Like you see on my preg_match_all code now, I go with 8 patterns for the first and also 8 patterns for the second one to get the first string.

Note: I just only run this script on console command, not in PHP application. So I don't pay any attention to the performance and it doesn't matter too.

Thank you for your help!

Edited

Thank you for the response. I tried both your regex, almost there. My question might confusing. I am not english speaker. I copy paste from regex101. It might be easier to understand, what I am trying to achieve.

https://regex101.com/r/uX5nqR/2

and this one too

https://regex101.com/r/Fxs7yY/1

Please check this out. I tried to extract translations from wordpress project and also twig file which using "trans" filter. I know there are mo po Editor, but the editor don't recognize the file extension I used.

Upvotes: 0

Views: 120

Answers (2)

Fiona Runge
Fiona Runge

Reputation: 2311

I took the liberty of writing this in JavaScript, but the regex will work the same.

My complete code looks like this:

const r = /^_[e_]\((\"(.*)\"|\'(.*)\')(, (\"(.*)\"|\'(.*)\'))?\);$/;

const xs = [
  "_e('string');",
  "_e(\"string\");",
  "_e('string', 'string2');",
  "_e(\"string\", 'string2');",
  "__('string');",
  "__(\"string\");",
  "__('string', 'string2');",
  "__(\"string\", 'string2');",
];

xs.forEach((x) => {
  const matches = x.match(r);

  if(matches){
    console.log('matches are:\n ', matches.filter(m => m !== undefined).join('\n  '));
  }else{
    console.log('no matches for', x);
  }
});

Now let me explain how the regex works and how I arrived at it: First I noticed that all your strings start with _ and end with );, so I knew the regex had to look something like ^…\);$. Here ^ and $ mark the beginning and end of the string, and you should leave them out if they're not required.

After the initial _ you've got either another _ or a e, so we put these into a group followed by the opening parenthesis: [e_]\(.

Now we have a string that is either in " or in ', and we put it down as alternatives: (\"(.*)\"|\'(.*)\').

This string is repeated, but optionally, with a leading , in front. So we get (, …)? for the optional part, and (\"(.*)\"|\'(.*)\') for the whole second portion.


For the second portion of your problem you can use the same strategy:

"string"|trans
'string'|trans
"string"|trans({}, "string2")
'string'|trans({}, 'string2')
'string'|trans({}, "string2")
"string"|trans({}, 'string2')

Start building up your regex from the similarities. We've got the same string pattern as before used twice, and the optional second part now looks like (\(\{\}, (\"(.*)\"|\'(.*)\')\))?.

This way we can end up with a regex like this:

^(\"(.*)\"|\'(.*)\')\|trans\(\{\}, (\"(.*)\"|\'(.*)\')\))?$

Please note that this regex is not tested, but just a guess from my side.


Upon further discussion it became apparent that we're looking at several matches in a larger bunch of text. To adapt to this we need to exclude the ' and " characters from the innermost groups, which leaves us with these regexes:

_[e_]\(("([^"]*)"|\'([^']*)\')(, ("([^"]*)"|\'([^']*)\'))?\);
(\"(.*)\"|\'(.*)\')\|trans(\(\{\}, (\"(.*)\"|\'(.*)\')\))?

I've also noted that my second regex apparently had an unmatched parenthesis in it.

Upvotes: 1

dkellner
dkellner

Reputation: 9966

I tried to understand the purpose of these regexes - here's what I think. (Let me omit the slashes on both sides, also the string quotes belonging to the language instead of the regex itself.)

(__|_e)\(\"(.*)\"
(__|_e)\(\'(.*)\'

This way you get all the hits of your 8 regexes above; but that's probably not what you were trying to achieve.

As far as I understand, you want to list the I18N refs in your code, with one or more arguments between the brackets. I think the best way to do it is run a preg_match_all with the simplest form of the pattern:

(__|_e)\(.*\)

or maybe this one is better:

(__|_e)\([^\)]+\)     // works for multiple calls in one line, ignores empties

...and then iterate the results one by one and split them by comma:

foreach($matches as $m) {
    $args = explode(",",$m[1]);  // [1] = second subpattern
    ;
    ; // now you have the arguments of this function call
    ;
}

If this answer is not helping, let's refine the question :)

Upvotes: 0

Related Questions