clamport
clamport

Reputation: 311

Matching regular expressions

I have a regular expression, it's basically to update log4j syntax to log4j2 syntax, removing the string replacement. The regular expression is as follows

(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w\(\)\.\d+]*|\([\w\(\)\.\d+]*\s*(?:\+|-)\s*[\w\(\)\.\d+]*\))(?:\s\+\s*|\s*\);)

This will successfully match the variables in the following strings

("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);

But not '+ thingCollection.get(0).getMyId()' in

("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);

I am getting better with regular expressions, but this one has me a bit stumped. Thanks!

Upvotes: 1

Views: 83

Answers (3)

user557597
user557597

Reputation:

You might be able to pare it down to this (?:^\(\s*|\s*\+\s*|,\s*)(?:[\w().\s+]+|\([\w().\s+-]*\))(?:(?=,)|\s*\+\s*|\s*\);)

101 regex

It consolidates some constructs.

To fix the immediate problem, I added a comma in some classes.
A note that this kind of regex is fraught with problematic type of flow.

 (?:
      ^ \( \s* 
   |  \s* \+ \s* 
   |  , \s* 
 )
 (?:
      [\w().\s+]+ 
   |  \( [\w().\s+-]* \) 
 )
 (?:
      (?= , )
   |  \s* \+ \s* 
   |  \s* \); 
 )

Upvotes: 0

Borodin
Borodin

Reputation: 126722

For some reason, when some people are writing a regex pattern, they forget that the whole of the Perl language is still available

I would just delete all the strings and find the remaining substrings that look like variable names

use strict;
use warnings 'all';
use feature qw/ say fc /;

use List::Util 'uniq';

my @variables;

while ( <DATA> ) {
    s/"[^"]*"//g;
    push @variables, /\b[a-z]\w*/ig;
}

say for sort { fc $a cmp fc $b } uniq @variables;

__DATA__
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);

output

address
countInUse
e
endTime
get
getMyId
otherThingId
secondThingId
size
startTime
thingCollection
thingId
things

Upvotes: 1

jjspace
jjspace

Reputation: 187

You should be able to simplify your regex to match things in between '+' signs.

(?:\+)([^"]*?)(?:[\+,]) Working Example

(Note the ? after the * this makes the * lazy so it matches as little as possible to catch all occurrences)

If you want just the variable you could access the first capture group from that expression or ignore the capture group to get the full match.


Updated Version (?:\+)([^"]*?)(?:[\+,])|\s([^"+]*?)\);Working Example

Note with the new version that the variable might get placed into capture group 2 instead of 1

Upvotes: 0

Related Questions