Tom Iv
Tom Iv

Reputation: 449

How to find the largest repeating string with overlap in a line

I have a series of lines such as

my $string = "home test results results-apr-25 results-apr-251.csv";
@str = $string =~ /(\w+)\1+/i;
print "@str";

How do I find the largest repeating string with overlap which are separated by whitespace? In this case I'm looking for the output :

results-apr-25

Upvotes: 2

Views: 117

Answers (4)

Borodin
Borodin

Reputation: 126742

It looks like you need the String::LCSS_XS which calculates Longest Common SubStrings. Don't try it's Perl-only twin brother String::LCSS because there are bugs in that one.

use strict;
use warnings;

use String::LCSS_XS;
*lcss = \&String::LCSS_XS::lcss; # Manual import of `lcss`

my $var = 'home test results results-apr-25 results-apr-251.csv';
my @words = split ' ', $var;

my $longest;
my ($first, $second);

for my $i (0 .. $#words) {
  for my $j ($i + 1 .. $#words) {
    my $lcss = lcss(@words[$i,$j]);
    unless ($longest and length $lcss <= length $longest) {
      $longest = $lcss;
      ($first, $second) = @words[$i,$j];
    }
  }
}

printf qq{Longest common substring is "%s" between "%s" and "%s"\n}, $longest, $first, $second;

output

Longest common substring is "results-apr-25" between "results-apr-25" and "results-apr-251.csv"

Upvotes: 2

scozy
scozy

Reputation: 2582

If you want to make it without a loop, you will need the /g regex modifier.

This will get you all the repeating string:

my @str = $string =~ /(\S+)(?=\s\1)/ig;

I have replaced \w with \S (in your example, \w doesn't match -), and used a look-ahead: (?=\s\1) means match something that is before \s\1, without matching \s\1 itself—this is required to make sure that the next match attempt starts after the first string, not after the second.

Then, it is simply a matter of extracting the longest string from @str:

my $longest = (sort { length $b <=> length $a } @str)[0];

(Do note that this is a legible but far from being the most efficient way of finding the longest value, but this is the subject of a different question.)

Upvotes: 1

David Michael Gang
David Michael Gang

Reputation: 7299

my $var = "home test results results-apr-25 results-apr-251.csv";
my @str = split " ", $var;
my %h;
my $last = pop @str;

while (my $curr = pop @str ) {
        if(($curr =~/^$last/) || $last=~/^$curr/) {
                $h{length($curr)}= $curr ;
        }
        $last = $curr;
}

my $max_key = max(keys %h);
print $h{$max_key},"\n";

Upvotes: 1

Toto
Toto

Reputation: 91488

How about:

my $var = "home test results results-apr-25 results-apr-251.csv";
my $l = length $var;
for (my $i=int($l/2); $i; $i--) {
    if ($var =~ /(\S{$i}).*\1/) {
        say "found: $1";
        last;
    }
}

output:

found:  results-apr-25

Upvotes: 0

Related Questions