user5858
user5858

Reputation: 1221

Perl m operator question

Why does this code print 51 and not 26? I'm trying to extract the "values". That is I want(the bold one): <option value="Andaman & Nicobar">Andaman & Nicobar</option>

As per definition m with g operator in list context should return pattern in parantheses?

my $firstpage=<<'EOF';
 <option value="Andaman & Nicobar">Andaman & Nicobar</option>
            <option value="Andhra Pradesh">Andhra Pradesh</option>
            <option value="Assam">Assam</option>
            <option value="Bihar">Bihar</option>
            <option value="Calcutta Telecom District">Calcutta Telecom District</option>
            <option value="Chennai Telecom District">Chennai Telecom District</option>
            <option value="Chhattisgarh">Chhattisgarh</option>
            <option value="Gujarat">Gujarat</option>
            <option value="Haryana">Haryana</option>
            <option value="Himachal Pradesh">Himachal Pradesh</option>
            <option value="Jammu & Kashmir">Jammu & Kashmir</option>
            <option value="Jharkhand">Jharkhand</option>
            <option value="Karnataka">Karnataka</option>
            <option value="Kerala">Kerala</option>
            <option value="Madhya Pradesh">Madhya Pradesh</option>
            <option value="Maharashtra">Maharashtra</option>
            <option value="North East I">North East I</option>
            <option value="North East II">North East II</option>
            <option value="Orissa">Orissa</option>
            <option value="Punjab">Punjab</option>
            <option value="Rajasthan">Rajasthan</option>
            <option value="Tamilnadu">Tamilnadu</option>
            <option value="UP East">UP East</option>
            <option value="UP West">UP West</option>
            <option value="Uttaranchal">Uttaranchal</option>
            <option value="West Bengal">West Bengal</option>
EOF

my @cities=$firstpage=~m{(?<=")([^"]*)(?=")}gs;

print scalar @cities;

Upvotes: 0

Views: 260

Answers (3)

user237419
user237419

Reputation: 9064

a better one would be:

my @cities=($firstpage=~/value="([^"]+)"/gs);

in this case

Upvotes: 2

typo.pl
typo.pl

Reputation: 8942

The regex is grabbing what you think are the quoted cities, as well as, the text between end-quote of one city and the beginning quote of the next city. I assume if you don't make the zero-width assertion on the end double quote, your problem will go away.

Upvotes: 1

ysth
ysth

Reputation: 98388

Each /g match starts where the previous one left off, but since you are using zero-width assertions, you aren't actually consuming the ". So

">Andaman & Nicobar</option>
        <option value="

is considered a match too.

Do:

my @cities = $firstpage =~ m/"([^"]*)"/gs;

instead. Note that if there are capturing parentheses, only the contents of those are returned by m//g on success in list context.

Upvotes: 7

Related Questions