Reputation: 311
I have a regular expression, it's basically to update log4j syntax to log4j2 syntax, removing the string replacement. The regular expression is as follows
(?:^\(\s*|\s*\+\s*|,\s*)(?:[\w\(\)\.\d+]*|\([\w\(\)\.\d+]*\s*(?:\+|-)\s*[\w\(\)\.\d+]*\))(?:\s\+\s*|\s*\);)
This will successfully match the variables in the following strings
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
But not '+ thingCollection.get(0).getMyId()' in
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
I am getting better with regular expressions, but this one has me a bit stumped. Thanks!
Upvotes: 1
Views: 83
Reputation:
You might be able to pare it down to this (?:^\(\s*|\s*\+\s*|,\s*)(?:[\w().\s+]+|\([\w().\s+-]*\))(?:(?=,)|\s*\+\s*|\s*\);)
It consolidates some constructs.
To fix the immediate problem, I added a comma in some classes.
A note that this kind of regex is fraught with problematic type of flow.
(?:
^ \( \s*
| \s* \+ \s*
| , \s*
)
(?:
[\w().\s+]+
| \( [\w().\s+-]* \)
)
(?:
(?= , )
| \s* \+ \s*
| \s* \);
)
Upvotes: 0
Reputation: 126722
For some reason, when some people are writing a regex pattern, they forget that the whole of the Perl language is still available
I would just delete all the strings and find the remaining substrings that look like variable names
use strict;
use warnings 'all';
use feature qw/ say fc /;
use List::Util 'uniq';
my @variables;
while ( <DATA> ) {
s/"[^"]*"//g;
push @variables, /\b[a-z]\w*/ig;
}
say for sort { fc $a cmp fc $b } uniq @variables;
__DATA__
("Unable to retrieve things associated with this='" + thingId + "' in " + (endTime - startTime) + " ms");
("Persisting " + things.size() + " new or updated thing(s)");
("Count in use for thing=" + secondThingId + " is " + countInUse);
("Unable to check thing state '" + otherThingId + "' using '" + address + "'", e);
("Exception occured while updating thingId="+ thingCollection.get(0).getMyId(), e);
address
countInUse
e
endTime
get
getMyId
otherThingId
secondThingId
size
startTime
thingCollection
thingId
things
Upvotes: 1
Reputation: 187
You should be able to simplify your regex to match things in between '+' signs.
(?:\+)([^"]*?)(?:[\+,])
Working Example
(Note the ? after the * this makes the * lazy so it matches as little as possible to catch all occurrences)
If you want just the variable you could access the first capture group from that expression or ignore the capture group to get the full match.
Updated Version (?:\+)([^"]*?)(?:[\+,])|\s([^"+]*?)\);
Working Example
Note with the new version that the variable might get placed into capture group 2 instead of 1
Upvotes: 0