Reputation: 1221
Why does this code print 51 and not 26? I'm trying to extract the "values". That is I want(the bold one): <option value="Andaman & Nicobar">Andaman & Nicobar</option>
As per definition m with g operator in list context should return pattern in parantheses?
my $firstpage=<<'EOF';
<option value="Andaman & Nicobar">Andaman & Nicobar</option>
<option value="Andhra Pradesh">Andhra Pradesh</option>
<option value="Assam">Assam</option>
<option value="Bihar">Bihar</option>
<option value="Calcutta Telecom District">Calcutta Telecom District</option>
<option value="Chennai Telecom District">Chennai Telecom District</option>
<option value="Chhattisgarh">Chhattisgarh</option>
<option value="Gujarat">Gujarat</option>
<option value="Haryana">Haryana</option>
<option value="Himachal Pradesh">Himachal Pradesh</option>
<option value="Jammu & Kashmir">Jammu & Kashmir</option>
<option value="Jharkhand">Jharkhand</option>
<option value="Karnataka">Karnataka</option>
<option value="Kerala">Kerala</option>
<option value="Madhya Pradesh">Madhya Pradesh</option>
<option value="Maharashtra">Maharashtra</option>
<option value="North East I">North East I</option>
<option value="North East II">North East II</option>
<option value="Orissa">Orissa</option>
<option value="Punjab">Punjab</option>
<option value="Rajasthan">Rajasthan</option>
<option value="Tamilnadu">Tamilnadu</option>
<option value="UP East">UP East</option>
<option value="UP West">UP West</option>
<option value="Uttaranchal">Uttaranchal</option>
<option value="West Bengal">West Bengal</option>
EOF
my @cities=$firstpage=~m{(?<=")([^"]*)(?=")}gs;
print scalar @cities;
Upvotes: 0
Views: 260
Reputation: 9064
a better one would be:
my @cities=($firstpage=~/value="([^"]+)"/gs);
in this case
Upvotes: 2
Reputation: 8942
The regex is grabbing what you think are the quoted cities, as well as, the text between end-quote of one city and the beginning quote of the next city. I assume if you don't make the zero-width assertion on the end double quote, your problem will go away.
Upvotes: 1
Reputation: 98388
Each /g match starts where the previous one left off, but since you are using zero-width assertions, you aren't actually consuming the ". So
">Andaman & Nicobar</option>
<option value="
is considered a match too.
Do:
my @cities = $firstpage =~ m/"([^"]*)"/gs;
instead. Note that if there are capturing parentheses, only the contents of those are returned by m//g on success in list context.
Upvotes: 7