Reputation: 67
I am trying to find out the number of occurrences of "The/the". Below is the code I tried"
print ("Enter the String.\n");
$inputline = <STDIN>;
chop($inputline);
$regex="\[Tt\]he";
if($inputline ne "")
{
@splitarr= split(/$regex/,$inputline);
}
$scalar=@splitarr;
print $scalar;
The string is :
Hello the how are you the wanna work on the project but i the u the The
The output that it gives is 7. However with the string :
Hello the how are you the wanna work on the project but i the u the
the output is 5. I suspect my regex. Can anyone help in pointing out what's wrong.
Upvotes: 0
Views: 2006
Reputation: 5870
The following snippet uses a code side-effect to increment a counter, followed by an always-failing match to keep searching. It produces the correct answer for matches that overlap (e.g. "aaaa" contains "aa" 3 times, not 2). The split-based answers don't get that right.
my $i;
my $string;
$i = 0;
$string = "aaaa";
$string =~ /aa(?{$i++})(?!)/;
print "'$string' contains /aa/ x $i (should be 3)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the The";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 6)\n";
$i = 0;
$string = "Hello the how are you the wanna work on the project but i the u the";
$string =~ /[tT]he(?{$i++})(?!)/;
print "'$string' contains /[tT]he/ x $i (should be 5)\n";
Upvotes: 1
Reputation: 241808
With split
, you're counting the substrings between the the's. Use match instead:
#!/usr/bin/perl
use warnings;
use strict;
my $regex = qr/[Tt]he/;
for my $string ('Hello the how are you the wanna work on the project but i the u the The',
'Hello the how are you the wanna work on the project but i the u the',
'the theological cathedral'
) {
my $count = () = $string =~ /$regex/g;
print $count, "\n";
my @between = split /$regex/, $string;
print 0 + @between, "\n";
print join '|', @between;
print "\n";
}
Note that both methods return the same number for the two inputs you mentioned (and the first one returns 6, not 7).
Upvotes: 1
Reputation: 126722
I get the correct number - 6 - for the first string
However your method is wrong, because if you count the number of pieces you get by splitting on the regex pattern it will give you different values depending on whether the word appears at the beginning of the string. You should also put word boundaries \b
into your regular expression to prevent the regex from matching something like theory
Also, it is unnecessary to escape the square brackets, and you can use the /i
modifier to do a case-independent match
Try something like this instead
use strict;
use warnings;
print 'Enter the String: ';
my $inputline = <>;
chomp $inputline;
my $regex = 'the';
if ( $inputline ne '' ) {
my @matches = $inputline =~ /\b$regex\b/gi;
print scalar @matches, " occurrences\n";
}
Upvotes: 3
Reputation: 3223
What you need is 'countof' operator to count the number of matches:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/[Tt]he/g;
print $count;
If you want to select only the word the
or The
, add word boundary:
my $string = "Hello the how are you the wanna work on the project but i the u the The";
my $count = () = $string =~/\b[Tt]he\b/g;
print $count;
Upvotes: 0