Reputation: 11772
I want to remove some words within two patterns using perl
The following is my text
..........
QWWK jhjh kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........
Now i want to remove all PQXY
words which only lies between the two patterns
^QWWK
and KWWQ$
I know how to replace the whole thing inbetween the two patterns by the following command
perl -0777pe 's/^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$/sometext/gms' filename
Also note that ^QWWK(?:(?!QWWK|KWWQ).)*KWWQ$
this pattern only matches those where there is no QWWK and KWWQ inbetween.
Upvotes: 4
Views: 945
Reputation: 66883
Here is the approach you've tried, with the little more needed for it to work
perl -0777 -wpe's{^(QWWK (?:(?!QWWK|KWWQ).)*? KWWQ)$}{ $1 =~ s/PQXY//gr }egmsx' file
The /e
modifier makes it evaluate the replacement side as code, and we run a regex there.
In that regex the /r
modifier makes it return the changed string (and not change the original, what allows us to run it on $1
which is read-only).
The requirement that the ^QWWK
-to-KWWQ$
block of text not contain either of these phrases is satisfied by the code above but a few comments may be helpful.
We don't need the non-greedy .*?
since .*
(following the negative lookahead) actually stops at KWWQ$
. But this is tricky to ascertain, and .*
just has the potential to slurp up all up to the very last KWWQ
, including all other possible blocks and any text between them.
Altogether I just find .*?
safer and simpler, specially as that is what is needed.
The QWWK
must start a line (it's given with ^
in the question) to be a marker for a block. If an extra QWWK
is found inside the block then the whole block does not match. But, if that "extra" QWWK
inside happens to be on the beginning of a line then
what would've been a block doesn't match, since there is QWWK
inside
a block is in fact matched beginning with that QWWK
I use /x
above so to be able to space out the pattern for readability.
Upvotes: 1
Reputation: 17041
If I understand your question correctly, this may be clearer with other tools than regexes. The following does collapse any whitespace between words to a single space.
Input qwwk.txt
(with one line added)
..........
QWWK jhjh kljdfh jklh jskdhf jkh PQXY
lhj ah jh sdlkjh PQXY jha slkdjh
PQXY jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........
KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ
Command perl qwwk.pl qwwk.txt
Output
..........
QWWK jhjh kljdfh jklh jskdhf jkh
lhj ah jh sdlkjh jha slkdjh
jh alkjh ljk
kjhaksj dkjhsd KWWQ
hahs dkj h PQXY
.........
KWWQ in mid line doesn't trigger: QWWK a PQXY b KWWQ c QWWK d PQXY e KWWQ
Program qwwk.pl
use strict; use warnings;
while(<>) { # for each line
my @out;
my @words=split; # get its words
for my $i (0..$#words) {
my $w=$words[$i];
my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ));
# Keep track of where we are. See notes below.
push @out, $w unless $active and ($w eq q(PQXY));
# Save words we want to keep
} #foreach word
print join(q( ), @out), qq(\n); # Print the words we saved
} #foreach line
The key is that the flip-flop (..
) operator in the $active= FOO .. BAR
assignment keeps its state regardless of what is happening around it. It will be true from
a QWWK
at the start of a line (($i==0 && $w eq q(QWWK))
) to a KWWQ
at the end of the line (($i==$#words && $w eq q(KWWQ))
), regardless of how many lines intervene.
As a one-liner
perl -Mstrict -Mwarnings -ne 'my @out; my @words=split; for my $i (0..$#words) { my $w=$words[$i]; my $active = ($i==0 && $w eq q(QWWK)) .. ($i==$#words && $w eq q(KWWQ)); push @out, $w unless $active and ($w eq q(PQXY)); } print join(q( ), @out), qq(\n);' qwwk.txt
The difference here is that -n
provides the while(<>){}
loop, so that's not included in the -e
script. (Plus, now you know why I used q()
and qq()
in the standalone program ;) .)
Upvotes: 1
Reputation: 2483
Update: To replace PQXY only if QWWK or KWWQ are NOT present between ^QWWK and KWWQ$ give this a try:
perl -pe 'if (/^QWWK/ .. /KWWQ$/) {s/PQXY//g if ! /.+QWWK/ && !/KWWQ.+/}' filename
I'm sure it can be cleaned up / golfed, however I think it will give you what you are asking for.
Upvotes: 1
Reputation: 241828
You can use the range operator:
perl -pe 's/PQXY//g if /^QWWK/ .. /KWWQ$/'
Upvotes: 3