Reputation: 13257
I want to find the word sprintf
in my code. What Perl regular expression should be used?
There are some lines which have text like sprintf_private
, which I want to exclude, but need just sprintf
.
Upvotes: 3
Views: 11225
Reputation: 13942
If you want to find all occurrences of sprintf
on lines that do not contain sprintf_private
, you might use a pair of regexes:
while( my $line = <DATA> ) {
next if $line =~ m/\bsprintf_private\b/;
while( $line =~ m/\bsprintf\b/g ) {
print "[sprintf] found on line $. at column $-[0]\n";
}
}
This first rejects any line containing sprintf_private
. Then lines not containing that disqualifier are scanned for all occurrences of sprintf
. Wherever it's found, a message is printed identifying the line in the file and the starting column of the match where sprintf
is found.
The $.
and @-
special variables are described in perlvar. And some good reading on regular expressions can be found in perlrequick and perlretut. The first regular expression is pretty simple; it just uses the \b
zero width assertion to assure that the disqualifying substring has a word boundary at each side of it. The second regex uses that same technique but also applies the /g
modifier to iterate over all occurrences of sprintf
just in case there happens to be more than one occurrence per line.
The zero width assertion \b
matches anywhere that a \w\W
or \W\w
transition occurs. Since the character class \w
contains all alpha characters (where what constitutes "all" depends on your unicode_strings
flag, or /u
), plus underscore and numeric digits (ie, whatever characters are permissible in an identifier), you might find the \b
word boundary too restrictive. If you find that the "simple" solution is too naive of an approach, you could go the extra mile and really narrow down what should qualify as a word boundary by using a regex that looks like this:
(?<!\p{Alpha})sprintf(?!\p{Alpha})
If you chose to go this route, the solution would look like this:
while( my $line = <DATA> ) {
next if $line =~ m/(?<!\p{Alpha})sprintf_private(?!\p{Alpha})/;
while( $line =~ m/(?<!\p{Alpha})sprintf(?!\p{Alpha})/g ) {
print "[sprintf] found on line $. at column $-[0]\n";
}
}
This uses zero width negative lookbehind and zero-width negative lookahead assertions that reject matches where the character immediately to the left or right of the primary substring are "Alpha" characters, rather than using the slightly more naive \b
.
Upvotes: 7