Mounnad Mohamed
Mounnad Mohamed

Reputation: 23

Calculate Number of Consecutive Substring in a String using Perl

I have a string with multiple sequences of consecutive String like:

my $substring = "CAG"; my $str = "CAGCAGCAGCAGPGHSMCAGCAG";

I want to calculate the max repeated substring in the str.

Upvotes: 0

Views: 149

Answers (3)

ysth
ysth

Reputation: 98398

my $substring = 'CAG';
my $str = 'CAGCAGCAGCAGPGHSMCAGCAG';
# look for a series of consecutive $substring not followed later by a longer such series
my ($longest_substring) = $str =~ /((?:\Q$substring\E)+)(?!.*?\1\Q$substring\E)/s;
my $repetitions = length($longest_substring // '') / length($substring);

Upvotes: 2

Michael Tétreault
Michael Tétreault

Reputation: 424

Try this:

my $number = () = $str =~ /$substring/gi;
print $number;

Upvotes: 0

choroba
choroba

Reputation: 241888

The matching operator with the /g modifier in list context returns all the matches. To count them, we can impose scalar context to the result:

my @matches = $str =~ /$substring/g;
my $count = scalar @matches;

which returns 6.

It can be further shortened to

my $count = () = $str =~ /$substring/g;

Where the () = assignment forces list context, but assigning it to a scalar variable forces the scalar context.

Note that this doesn't report the correct number if the matches are overlapping, e.g.

my $str = 'CACACAC';
my $substring = 'CAC';

The above expression would return 2, because matching with /g starts searching for the next match where the last match ended. To fix that, use the look-ahead assertion which doesn't consume the matching part:

my $count = () = $str =~ /(?=$substring)/g;

Upvotes: 1

Related Questions