Reputation: 22160
Let's say I want to write a regular expression to change all <abc>
, <def>
, and <ghi>
tags into <xyz>
tags.. and I also want to change their closing tags to </xyz>
. This seems like a reasonable regex (ignore the backticks; StackOverflow has trouble with the less-than signs if I don't include them):
`s!<(/)?(abc|def|ghi)>!<${1}xyz>!g;`
And it works, too. The only problem is that for opening tags, the optional $1 variable gets assigned undef, and so I get a "Use of uninitialized value..." warning.
What's an elegant way to fix this? I'd rather not make this into two separate regexs, one for opening tags and another for closing tags, because then there are two copies of the taglist that need to be maintained, instead of just one.
Edit: I know I could just turn off warnings in this region of the code, but I don't consider that "elegant".
Upvotes: 1
Views: 1385
Reputation: 6014
You could just make your first match be (</?)
, and get rid of the hard-coded <
on the "replace" side. Then $1 would always have either <
or </
. There may be more elegant solutions to address the warning issue, but this one should handle the practical problem.
Upvotes: 1
Reputation: 551
Be careful in as much as HTML is a bit harder then it looks to be at first glance. For example, do you want to change "<abc foo='bar'>" to "<xyz foo='bar'>"? Your regex won't. Do you want to change "<img alt='<abc>'>"? The regex will. Instead, you might want to do something like this:
use HTML::TreeBuilder;
my $tree=HTML::TreeBuilder->new_from_content("<abc>asdf</abc>");
for my $tag (qw<abc def ghi>) {
for my $elem ($tree->look_down(_tag => $tag)) {
$elem->tag('xyz');
}
}
print $tree->as_HTML;
That keeps you from having to do the fiddly bits of parsing HTML yourself.
Upvotes: 0
Reputation: 47829
I'd rather not make this into two separate regexs, one for opening tags and another for closing tags, because then there are two copies of the taglist that need to be maintained
Why? Put your taglist into a variable and interpolate that variable into as many regexes as you like. I'd consider this even whith a single regex because it's much more readable with a complicated regex (and what regex isn't complicated?).
Upvotes: 0
Reputation: 41564
Here is one way:
s!<(/?)(abc|def|ghi)>!<$1xyz>!g;
Update: Removed irrelevant comment about using (?:pattern)
.
Upvotes: 1
Reputation: 168616
s!<(/?)(abc|def|ghi)>!<${1}xyz>!g;
The only difference is changing "(/)?" to "(/?)". You have already identified several functional solution. This one has the elegance you asked for, I think.
Upvotes: 1
Reputation: 89055
Move the question mark inside the capturing bracket. That way $1 will always be defined, but may be a zero-length string.
Upvotes: 10
Reputation: 3474
To make the regex capture $1 in either case, try:
s!<(/|)?(abc|def|ghi)>!<${1}xyz>!g;
^
note the pipe symbol, meaning '/' or ''
For '' this will capture the '' between '<' and 'abc>', and for '', capture '/' between '<' and 'abc>'.
Upvotes: 0
Reputation: 1157
Add
no warnings 'uninitialized';
or
s!<(/)?(abc|def|ghi)>! join '', '<', ${1}||'', 'xyz>' !ge;
Upvotes: -1