fahim fana
fahim fana

Reputation: 19

Count the articles in a Paragraph

I need to count the articles (a , an, the) in a paragraph using perl. I try but it fails

$a += scalar(split(/a./, $_));
$an += scalar(split(/\san\s/, $_));
$the += scalar(split(/the/, $_));

Upvotes: 1

Views: 87

Answers (3)

Borodin
Borodin

Reputation: 126742

The regex that @npinti suggested will work for you, but you need to use a global pattern match in list context and convert that to a scalar.

Like this

use strict;
use warnings;

my $s = 'I need to count the articles (a , an, the) in a paragraph using perl.';

my @matches = $s =~ /\b(a|an|the)\b/g;
print scalar @matches, "\n";

output

5

Upvotes: 2

vks
vks

Reputation: 67988

(?:^|(?<=\s))(?:a|an|the)(?=\s|$)

You can use this to count the articles.

Upvotes: 0

npinti
npinti

Reputation: 52185

Try using something like this: \b(a|an|the)\b (example here). This can be broken down as:

  • \ba\b # look for the a article.
  • \ban\b # look for the an article.
  • \bthe\b # look for the the article.

The problem with your regex is that with the exception of the an regex, you do not check to see if the article is a word within itself.

This first regex should match any a followed by any character, while the third will look for the, regardless of their location.

The \b will ensure that whatever you match, is either at the beginning of a string or else surrounded by white spaces.

Upvotes: 1

Related Questions