user2336315
user2336315

Reputation: 16067

Don't understand my regex's matches

I'm currently reading xml balises from a file but I tried to reduce this to this simple example.

#!/usr/bin/perl 

use strict;
use warnings;

my $str = '<tag x="20" y="7" x="15" z="14"/>';
if($str =~ /<tag.*(x|y|z)=\"(\d+)\".*(x|y|z)=\"(\d+)\".*(x|y|z)=\"(\d+)\".*\/>/){
    print "$1-$2\n";
    print "$3-$4\n";
    print "$5-$6\n";
}

As I understand my regex, the first x should match the first group, the first y the third group and the second x the fifth group.

So I expect as output:

x-20
y-7
x-15

But I get

y-7
x-15
z-14

Could someone explain what's happening here?

Upvotes: 0

Views: 44

Answers (2)

Shiplu Mokaddim
Shiplu Mokaddim

Reputation: 57650

Instead of .* use \s+. Becasue you actually want to match multiple space characters. not multiple any characters.

If this is really an assignment you should do it in a more proper way. And regular expression is not proper way for xml thing. As its assignment just write a parser. It easier than you think.

Upvotes: 1

mpapec
mpapec

Reputation: 50637

Use ? to make *, + quantifiers non-greedy as these are greedy by default (ie. matching any char . as much as possible)

$str =~ /<tag.*?(x|y|z)=\"(\d+)\".*?(x|y|z)=\"(\d+)\".*?(x|y|z)=\"(\d+)\".*\/>/

Upvotes: 1

Related Questions