Reputation: 1998

Perl regular expression removing duplicate consecutive substrings in a string

I tried to do a search on this particular problem, but all I get is either removal of duplicate lines or removal of repeated strings where they are separated by a delimiter.

My problem is slightly different. I have a string such as

    "comp name1 comp name2 comp name2 comp name3"

where I want to remove the repeated comp name2 and return only

    "comp name1 comp name2 comp name3"

They are not consecutive duplicate words, but consecutive duplicate substrings. Is there a way to solve this using regular expressions?

Upvotes: 4

Answers (5)

Anonymous

Reputation: 21

To avoid removing duplicate characters within the terms (e.g. comm1 -> com1) bracket .* in regular expression with \b.

s/(\b.*\b)\1/$1/g

Upvotes: 2

Alex Reynolds

Reputation: 96947

If you need something running in linear time, you could split the string and iterate through the list:

#!/usr/bin/perl                                                                                                                                                                                       

use strict;
use warnings;

my $str = "comp name1 comp name2 comp name2 comp name3";
my @elems = split("\\s", $str);
my $prevComp;
my $prevFlag = -1;
foreach my $elemIdx (0..(scalar @elems - 1)) {
    if ($elemIdx % 2 == 1) {
        if (defined $prevComp) {
            if ($prevComp ne $elems[$elemIdx]) {
                print " $elems[$elemIdx]";
                $prevFlag = 0;
            }
            else {
                $prevFlag = 1;
            }
        }
        else {
            print " $elems[$elemIdx]";
        }
        $prevComp = $elems[$elemIdx];
    }
    elsif ($prevFlag == -1) {
        print "$elems[$elemIdx]";
        $prevFlag = 0;
    }
    elsif ($prevFlag == 0) {
        print " $elems[$elemIdx]";
    }
}
print "\n";

Dirty, perhaps, but should run faster.

Upvotes: 1

John Sobolewski

Reputation: 4572

I never work with languages that support this but since you are using Perl ...

Go here .. and see this section....

Useful Example: Checking for Doubled Words

When editing text, doubled words such as "the the" easily creep in. Using the regex \b(\w+)\s+\1\b in your text editor, you can easily find them. To delete the second word, simply type in \1 as the replacement text and click the Replace button.

Upvotes: 1

Jonathan Leffler

Reputation: 754090

This works for me (MacOS X 10.6.7, Perl 5.13.4):

use strict;
use warnings;

my $input = "comp name1 comp name2 comp name2 comp name3" ;
my $output = "comp name1 comp name2 comp name3" ;

my $result = $input;
$result =~ s/(.*)\1/$1/g;

print "In:   <<$input>>\n";
print "Want: <<$output>>\n";
print "Got:  <<$result>>\n";

The key point is the '\1' in the matching.

Upvotes: 3

btilly

Reputation: 46408

s/(.*)\1/$1/g

Be warned that the running time of this regular expression is quadratic in the length of the string.

Upvotes: 8

Perl regular expression removing duplicate consecutive substrings in a string

Answers (5)

Related Questions