Reputation: 15703
I thought I had this figured out, but I'm wanting to find all occurances in a file where I have some text to delete between two double quotes.
I need to find a match first and then get everything from the first double quote to the match and then all the text to the second double quote and delete it. I don't want to just get text between two double quotes, as it may not be something in that file that I want to delete.
I used something like this:
perl -p -i.bak -e s/bar/foo/g bar.xml
first to do a find and replace that worked. Then I went to:
perl -p -i.bak -e s/..\/..\/bar\//g bar.xml
and that deleted everything up to bar, but I need to continue all the way to the second double quote and I'm not sure how to do that with Perl.
I assume it will be some regex mixed in, but nothing I've tried has worked. The part up to bar will always be the same, but the text will change after that point, however, it will always end with the second double quote for the part I want to delete. There will be text again after that point.
Upvotes: 3
Views: 3406
Reputation: 53478
You input says the file is .xml
- so I'm going to say what I usually do.
Use an XML Parser - I like XML::Twig
because I think it's easier to get to grips with initially. XML::LibXML
is good too.
Now, based on the question you're asking - it like you're trying to rewrite a file path within an XML attribute.
So:
#!/usr/bin/env perl/
use strict;
use warnings;
use XML::Twig;
#my $twig = XML::Twig -> parsefile ( 'test.xml');
my $twig = XML::Twig -> parse ( \*DATA );
foreach my $element ( $twig -> get_xpath('element[@path]') ) {
my $path_att = $element -> att('path');
$path_att =~ s,/\.\./\.\./bar/,,g;
$element -> set_att('path', $path_att);
}
$twig -> set_pretty_print('indented_a');
$twig -> print;
__DATA__
<root>
<element name="test" path="/path/to/dir/../../bar/some_dir">
</element>
<element name="test2" nopath="here" />
<element path="/some_path">content</element>
</root>
XML::Twig
also quite usefully supports parsefile_inplace
to work "sed style" to amend a file. The above is an illustration of the concept with some sample XML
- with a clearer example of what you're trying to do, I should be able to improve it.
Upvotes: 0
Reputation: 132783
Some people were asking about escaped quotes. There's a couple of tricks here. You want to ignore escaped quotes like \"
, but not quote characters that have an escaped escape, like \\"
. To ignore the first, I use a negative look behind. To not ignore the second, I temporarily change all \\
to πΊ. If you have πΊ in your data, choose something else.
use v5.14;
use utf8;
use charnames qw(:full);
my $regex = qr/
(?<!\\) " # a quote not preceded by a \ escape
(.*?) # anything, non greedily
(?<!\\) " # a quote not preceded by a \ escape
/x;
while( <DATA> ) {
# encode the escaped escapes for now
s/(?:\\){2}/\N{SMILING CAT FACE WITH OPEN MOUTH}/g;
print "$.: ", $_;
while( m/$regex/g ) {
my $match = $1;
# decode the escaped escapes
$match =~ s/\N{SMILING CAT FACE WITH OPEN MOUTH}/\\\\/g;
say "\tfound β $match";
}
}
__DATA__
"One group" and "another group"
This has "words between quotes" and words outside
This line has "an \" escaped quote" and other stuff
Start with \" then "quoted" and "quoted again"
Start with \" then "quoted \" with escape" and \" and "quoted again"
Start with \" then "quoted \\" with escape"
Start with \" then \\\\"quoted \\" with escape\\"
The output is:
1: "One group" and "another group"
found β One group
found β another group
2: This has "words between quotes" and words outside
found β words between quotes
3: This line has "an \" escaped quote" and other stuff
found β an \" escaped quote
4: Start with \" then "quoted" and "quoted again"
found β quoted
found β quoted again
5: Start with \" then "quoted \" with escape" and \" and "quoted again"
found β quoted \" with escape
found β quoted again
6: Start with \" then "quoted πΊ" with escape"
found β quoted \\
7: Start with \" then πΊπΊ"quoted πΊ" with escapeπΊ"
found β quoted \\
Upvotes: 2
Reputation: 336128
s/"[^"]*foo[^"]*"//g
works if there are no escaped quotes between the actual quotes, and if you want to remove a quoted string that contains foo
:
" # Match a quote
[^"]* # Match any number of characters except quotes
foo # Match foo
[^"]* # Match any number of characters except quotes
" # Match another quote
Upvotes: 5