Brad Allred
Brad Allred

Reputation: 7534

How to count matches for a named capture group in Perl

I have a Perl command I cobbled together that runs a regex find and replace on a file. It works great, but has the unfortunate side effect of "modifying" the file even if the resulting file is identical. This makes sense since it is replacing the matches with themselves. We cant have this because the result is part of a make pipeline and causes an entire rebuild every time it is run.

I would now like to run a command to get a count of matches for a specific named capture group so that I can test if anything needs to be replaced before actually running the first command.

The command is executed though bash with some bash variables: perl -0777 -i -pe '$cnt=0;s{('$PASSTHROUGH'|'$REPLACE')}{$+{PASSTHROUGH}?$+{PASSTHROUGH}:(++$cnt,'$REPLACEMENT')")}peg; END{print "$cnt\n"}'

Again, this works great and gives me the number of actual replacements made since $cnt is only incremented in the else branch of the ternary operator. If I were to run a match for only the $REPLACE pattern I would not get the correct number since often it would match things in the $PASSTHROUGH group.

I suspect there is a way to retrieve the count of a specific group, but I don't know Perl or the terminology, so I am struggling to find an answer to how I can alter this command to not do a replace, but rather simply count the matches to the $REPLACE sub-pattern only. It is a named group: (?<REPLACE>some-regex-pattern)

Upvotes: 0

Views: 225

Answers (1)

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

EDIT after question update

  • -0777 means the whole file is read once (input record separator undef)
  • -i : edit file inplace (like sed -i), must be removed to avoid to modify file
  • -p : prints lines

following command should just print the number of matches

perl -0777 -ne '$cnt=@a=m{('$PASSTHROUGH'(*SKIP)(?!)|'$REPLACE')}pg;print "$cnt\n"'

it is done differently :

  • the principle of pattern alternation is to match first what should fail to keep what we want
  • (*SKIP) : is a backtracking control verb which prevent regex engine to backtrack after match fail, that's what is done normally
  • (?!) : is the same as (*FAIL)

Upvotes: 3

Related Questions