Reputation: 449
I have a series of lines such as
my $string = "home test results results-apr-25 results-apr-251.csv";
@str = $string =~ /(\w+)\1+/i;
print "@str";
How do I find the largest repeating string with overlap which are separated by whitespace? In this case I'm looking for the output :
results-apr-25
Upvotes: 2
Views: 117
Reputation: 126742
It looks like you need the String::LCSS_XS
which calculates Longest Common SubStrings. Don't try it's Perl-only twin brother String::LCSS
because there are bugs in that one.
use strict;
use warnings;
use String::LCSS_XS;
*lcss = \&String::LCSS_XS::lcss; # Manual import of `lcss`
my $var = 'home test results results-apr-25 results-apr-251.csv';
my @words = split ' ', $var;
my $longest;
my ($first, $second);
for my $i (0 .. $#words) {
for my $j ($i + 1 .. $#words) {
my $lcss = lcss(@words[$i,$j]);
unless ($longest and length $lcss <= length $longest) {
$longest = $lcss;
($first, $second) = @words[$i,$j];
}
}
}
printf qq{Longest common substring is "%s" between "%s" and "%s"\n}, $longest, $first, $second;
output
Longest common substring is "results-apr-25" between "results-apr-25" and "results-apr-251.csv"
Upvotes: 2
Reputation: 2582
If you want to make it without a loop, you will need the /g
regex modifier.
This will get you all the repeating string:
my @str = $string =~ /(\S+)(?=\s\1)/ig;
I have replaced \w
with \S
(in your example, \w
doesn't match -
), and used a look-ahead: (?=\s\1)
means match something that is before \s\1
, without matching \s\1
itself—this is required to make sure that the next match attempt starts after the first string, not after the second.
Then, it is simply a matter of extracting the longest string from @str
:
my $longest = (sort { length $b <=> length $a } @str)[0];
(Do note that this is a legible but far from being the most efficient way of finding the longest value, but this is the subject of a different question.)
Upvotes: 1
Reputation: 7299
my $var = "home test results results-apr-25 results-apr-251.csv";
my @str = split " ", $var;
my %h;
my $last = pop @str;
while (my $curr = pop @str ) {
if(($curr =~/^$last/) || $last=~/^$curr/) {
$h{length($curr)}= $curr ;
}
$last = $curr;
}
my $max_key = max(keys %h);
print $h{$max_key},"\n";
Upvotes: 1
Reputation: 91488
How about:
my $var = "home test results results-apr-25 results-apr-251.csv";
my $l = length $var;
for (my $i=int($l/2); $i; $i--) {
if ($var =~ /(\S{$i}).*\1/) {
say "found: $1";
last;
}
}
output:
found: results-apr-25
Upvotes: 0