Reputation: 51
my $text ='<span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small>'
if ($text =~ m/<span>by <small class="author" itemprop="author">(.+?)<\/small>/ig){
$author = $1;
$authorcount{$author} +=1;
}
$authorcounttxt = "authorcount.txt";
open (OUTPUT3, ">$authorcounttxt");
foreach $author (sort { $authorcount{$b} <=> $authorcount{$a} } keys %authorcount){
print OUTPUT3 ("$author\t\t$authorcount{$author}\n");
}
close (OUTPUT3);
The desired output is:
J.K. Rowling 3
However I am only getting:
J.K. Rowling 1
Upvotes: 1
Views: 100
Reputation: 6798
As already indicated by previous posters the issue hidden in if ( $text =~ /.../gi )
, it evaluates to true
and block executed only once.
You are looking to process match in an array context which can be achieved with for
or while
loop.
Following code snippet demonstrates one of many approaches to the solution.
use strict;
use warnings;
use feature 'say';
my(%authors, $fname, $text, $re);
$fname = 'authorcount.txt';
$text = '<span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small><span>by <small class="author" itemprop="author">J.K. Rowling</small>';
$re = qr/<span>by <small class="author" itemprop="author">(.*?)<\/small>/;
$authors{$1}++ for $text =~ /$re/gi;
open my $fh, ">", $fname
or die "Can't open $fname";
say $fh "$_ $authors{$_}" for sort keys %authors;
close $fh;
NOTE: this code will work for your example $text = '...'
, if you intend to process complex HTML
files then Mojo::DOM is a right tool to a problem.
Upvotes: 1
Reputation: 106
Replace your if
with a while
to iterate through all of the matches of your regex match instead of only the first one:
while ($text =~ m/<span>by <small class="author" itemprop="author">(.+?)<\/small>/ig){
$author = $1;
$authorcount{$author} += 1;
}
Also obligatory note: parsing HTML with regexen is fraught with peril. Consider using a module that can properly parse HTML, Mojo::DOM for example.
Upvotes: 1
Reputation: 123380
if ($text =~ m/.../ig){ $author = $1; $authorcount{$author} +=1;
This is an if statement which means that the inner block while be entered at most once, i.e. if there is a first match. You likely meant to do a while statement to enter the inner block for each match:
while ($text =~ m/.../ig){ $author = $1; $authorcount{$author} +=1;
Upvotes: 1