Reputation: 9803
$string1 = "peachbananaapplepear";
$string2 = "juicenaapplewatermelonpear";
I want to know what's the longest common substring containing the word "apple".
$string2 =~ m/.+apple.+/;
print $string2;
So I use the match operator, and .+
for matching any character before and after the keyword "apple". When I print $string2
it doesn't return naapple
but returns the original $string2
instead.
Upvotes: 0
Views: 545
Reputation: 1933
Here is one approach. First get the locations where 'apple' appears in the strings. And for each of those locations in string1, look at all locations in string2. Look to the left and right to see how far the commonality extends from the initial location.
$string1 = "peachbananaapplepear12345applegrapeapplebcdefghijk";
$string2 = "juicenaapplewatermelonpearkiwi12345applebcdefghijkberryapple";
my $SearchFor="apple";
my $SearchStrLen = length($SearchFor);
# Get locations in first string where the search term appears
my @FirstPositions = getPostions($string1);
# Get locations in second string where the search term appears
my @SecondPositions = getPostions($string2);
CheckForMaxMatch();
sub getPostions
{
my $GivenString = shift;
my @Positions;
my $j=0;
for (my $i=0; $i < length($GivenString); $i += ($SearchStrLen+1) )
{
$j = index($GivenString, $SearchFor, $i);
if ($j == -1) {
last;
}
push (@Positions, $j);
$i = $j;
}
return @Positions;
}
sub CheckForMaxMatch
{
my $MaxLeft=0;
# From the location of 'apple', look to the left and right
# to see how far the characters are same
for my $i (@FirstPositions) {
for my $j (@SecondPositions) {
my $LeftMatchPos = getMaxMatch($i, $j, -1);
my $RightMatchPos = getMaxMatch($i, $j, 1);
if ( ($RightMatchPos - $LeftMatchPos) > ($MaxRight - $MaxLeft) ) {
$MaxLeft = $LeftMatchPos;
$MaxRight = $RightMatchPos;
}
}
}
my $LongestSubString = substr($string1, $MaxLeft, $MaxRight-$MaxLeft);
print "Longest common substring is: $LongestSubString\n";
print "It begins at $MaxLeft and ends at $MaxRight in string1\n";
}
sub getMaxMatch
{
my $i= shift;
my $j= shift;
my $direction= shift;
my $k = ($direction >= 1 ? $SearchStrLen : 0);
my $FirstChar = substr($string1, $i+($k * $direction), 1);
my $SecondChar = substr($string2, $j+($k * $direction), 1);
for ( ; $FirstChar && $SecondChar; $k++ )
{
$FirstChar = substr($string1, $i+($k * $direction), 1);
$SecondChar = substr($string2, $j+($k * $direction), 1);
if ( $FirstChar ne $SecondChar ) {
$direction < 1 ? $k-- : "";
my $pos = ($k ? ($i + $k * $direction) : $i);
return $pos;
}
}
return $i;
}
Upvotes: 1
Reputation: 386331
Based on the general algorithm, but tracks not only the length of the current run (@l
), but whether it includes the keyword (@k
). Only runs that include the keyword are considered for longest run.
use strict;
use warnings;
use feature qw( say );
sub find_substrs {
our $s; local *s = \shift;
our $key; local *key = \shift;
my @positions;
my $position = -1;
while (1) {
$position = index($s, $key, $position+1);
last if $position < 0;
push @positions, $position;
}
return @positions;
}
sub lcsubstr_which_include {
our $s1; local *s1 = \shift;
our $s2; local *s2 = \shift;
our $key; local *key = \shift;
my @key_starts1 = find_substrs($s1, $key)
or return;
my @key_starts2 = find_substrs($s2, $key)
or return;
my @is_key_start1; $is_key_start1[$_] = 1 for @key_starts1;
my @is_key_start2; $is_key_start2[$_] = 1 for @key_starts2;
my @s1 = split(//, $s1);
my @s2 = split(//, $s2);
my $length = 0;
my @rv;
my @l = ( 0 ) x ( @s1 + 1 ); # Last ele is read when $i1==0.
my @k = ( 0 ) x ( @s1 + 1 ); # Same.
for my $i2 (0..$#s2) {
for my $i1 (reverse 0..$#s1) {
if ($s1[$i1] eq $s2[$i2]) {
$l[$i1] = $l[$i1-1] + 1;
$k[$i1] = $k[$i1-1] || ( $is_key_start1[$i1] && $is_key_start2[$i2] );
if ($k[$i1]) {
if ($l[$i1] > $length) {
$length = $l[$i1];
@rv = [ $i1, $i2, $length ];
}
elsif ($l[$i1] == $length) {
push @rv, [ $i1, $i2, $length ];
}
}
} else {
$l[$i1] = 0;
$k[$i1] = 0;
}
}
}
for (@rv) {
$_->[0] -= $length;
$_->[1] -= $length;
}
return @rv;
}
{
my $s1 = "peachbananaapplepear";
my $s2 = "juicenaapplewatermelonpear";
my $key = "apple";
for (lcsubstr_which_include($s1, $s2, $key)) {
my ($s1_pos, $s2_pos, $length) = @$_;
say substr($s1, $s1_pos, $length);
}
}
This solution in O(NM), meaning it scales amazingly well (for what it does).
Upvotes: 0
Reputation: 1202
The =~ operator is not going to reassign the value of $string2. Try this:
$string2 =~ m/(.+apple.+)/;
my $match = $1;
print $match
Upvotes: 0