Reputation: 467
I'm trying to remove all the comments in a bunch of SGF files, and have come up with the following perl command:
perl -pi -e 's/P?C\[(?:[^\]\\]++|\\.)*+\]//gm' *.sgf
I'm trying to match and remove a C or PC followed by a left bracket, then characters that aren't right brackets (if they are they have to be escaped with a \
) and then a right bracket.
I'm trying to match the following examples:
C[HelloBot9 [-\]: GTP Engine for HelloBot9 (white): HelloBot version 0.6.26.08]
PC[IA [-\]: GTP Engine for IA (black): GNU Go version 3.7.11
]
C[person [-\]: \\\]]
C[AyaMC [3k\]: GTP Engine for AyaMC (black): Aya version 6.61 : If you pass, AyaMC
will pass. When AyaMC does not, please remove all dead stones.]
And some examples that shouldn't be matched:
XYZ[Other stuff \]]
C[stuff\]
PC[stuff\\\]
The regex works in several online regex testers (including a few that state they are perl regex testers), but for some reason doesn't work on the command line. Help is appreciated.
Upvotes: 3
Views: 546
Reputation: 626806
You need to run perl
with -0777
option to make sure that contents spanning across lines and matching the pattern can be found. So, using perl -0777pi -e
instead of perl -pi -e
will solve the issue.
I would also suggest optimizing the pattern a bit by unrolling the alternation group, thus, making matching process "linear":
s/P?C\[[^]\\]*(?:\\.[^]\\]*+)*]//sg
Note that if PC
should be matched as a whole word, add \b
before P
.
Pattern details:
P?C\[
- either PC[
or C[
literal char sequence[^]\\]*
- zero or more chars other than \
and ]
(?:\\.[^]\\]*+)*
- zero or more sequences of:
\\.
- a literal \
and then any char (.
)[^]\\]*+
- 0+ chars other than ]
and \
(matched possessively, no backtracking into the pattern)]
- a literal ]
symbol (note it does not have to be escaped outside the character class to denote a literal closing bracket)Upvotes: 2