Reputation: 557
Lets say i have a file like below:
And i want to store all the decimal numbers in a hash.
hello world 10 20
world 10 10 10 10 hello 20
hello 30 20 10 world 10
i was looking at this
and this worked fine:
> perl -lne 'push @a,/\d+/g;END{print "@a"}' temp
10 20 10 10 10 10 20 30 20 10 10
Then what i need was to count number of occurrences of each regex.
for this i think it would be better to store all the matches in a hash and assign an incrementing value for each and every key.
so i tried :
perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp
which gives me an output of:
> perl -lne '$a{$1}++ for ($_=~/(\d+)/g);END{foreach(keys %a){print "$_.$a{$_}"}}' temp
10.4
20.7
Can anybody correct me whereever i was wrong?
the output i expect is:
10.7
20.3
30.1
although i can do this in awk,i would like to do it only in perl
Also order of the output is not a concern for me.
Upvotes: 3
Views: 2324
Reputation:
Another option would be this:
$a{$1}++ while ($_=~/(\d+)/g);
This does what I think you expected your code to do: loop over each successful match as the matches happen. Thus the $1
will be what you think it is.
Just to be clear about the difference:
The single argument for
loop in Perl means "do something for each element of a list":
for (@array)
{
#do something to each array element
}
So in your code, a list of matches was built first, and only after the whole list of matches was found did you have the opportunity to do something with the results. $1
got reset on each match as the list was being built, but by the time your code was run, it was set to the last match on that line. That is why your results didn't make sense.
On the other hand, a while loop means "check if this condition is true each time, and keep going until the condition is false". Therefore, the code in a while loop will be executed on each match of a regex, and $1
has the value for that match.
Another time this difference is important in Perl is file processing. for (<FILE>) { ... }
reads the entire file into memory first, which is wasteful. It is recommended to use while (<FILE>)
instead, because then you go through the file line by line and keep only the information you want.
Upvotes: 4
Reputation: 85767
$a{$1}++ for ($_=~/(\d+)/g);
This should be
$a{$_}++ for ($_=~/(\d+)/g);
and can be simplified to
$a{$_}++ for /\d+/g;
The reason for this is that /\d+/g
creates a list of matches, which is then iterated over by for
. The current element is in $_
. I imagine $1
would contain whatever was left in there by the last match, but it's definitely not what you want to use in this case.
Upvotes: 5