Reputation: 113
I have an xml file where I want to match all xml tags that contain an attribute matching a certain string in Perl.
Sample XML:
<item attr="Car" />
<item attr="Apple_And_Pears.htm#123" />
<item attr="Paper" />
<item attr="Orange_And_Peach.htm#213" />
I want a regex that grabs all nodes that has an attribute that contains ".htm"
<item attr="Orange_And_Peach.htm#213" />
<item attr="Apple_And_Pears.htm#123" />
With the following regex, I'm matching with all tags rather than only tags with .htm attribute:
<item.*?attr="[^>]*>
Is there some sort of positive lookahead until a certain character?
Thanks
Upvotes: 1
Views: 2377
Reputation: 24565
As Grinnz suggested you should use an approriate xml-parser (check out this interesting post on stackoverflow explaining why), but since you asked for it here's a simple regex you could use with a positive lookahead:
<item.*?attr=".*(?=\.htm).*
If you want to match tags with only one ".htm" in it, you can use both a negative and positive lookaround:
^(?:(?!\.htm).)*\.htm(?!.*\.htm).*$
Upvotes: 1
Reputation: 9231
The appropriate Perl solution is not regex. With Mojo::DOM (one of many options):
use strict;
use warnings;
use Mojo::DOM;
use File::Slurper 'read_text';
my $xml = read_text 'test.xml';
my $dom = Mojo::DOM->new->xml(1)->parse($xml);
my $tags = $dom->find('item[attr*=".htm"]');
print "$_\n" for @$tags;
Upvotes: 5