Reputation: 1998
I tried to do a search on this particular problem, but all I get is either removal of duplicate lines or removal of repeated strings where they are separated by a delimiter.
My problem is slightly different. I have a string such as
"comp name1 comp name2 comp name2 comp name3"
where I want to remove the repeated comp name2 and return only
"comp name1 comp name2 comp name3"
They are not consecutive duplicate words, but consecutive duplicate substrings. Is there a way to solve this using regular expressions?
Upvotes: 4
Views: 7191
Reputation: 21
To avoid removing duplicate characters within the terms (e.g. comm1 -> com1) bracket .* in regular expression with \b.
s/(\b.*\b)\1/$1/g
Upvotes: 2
Reputation: 96947
If you need something running in linear time, you could split
the string and iterate through the list:
#!/usr/bin/perl
use strict;
use warnings;
my $str = "comp name1 comp name2 comp name2 comp name3";
my @elems = split("\\s", $str);
my $prevComp;
my $prevFlag = -1;
foreach my $elemIdx (0..(scalar @elems - 1)) {
if ($elemIdx % 2 == 1) {
if (defined $prevComp) {
if ($prevComp ne $elems[$elemIdx]) {
print " $elems[$elemIdx]";
$prevFlag = 0;
}
else {
$prevFlag = 1;
}
}
else {
print " $elems[$elemIdx]";
}
$prevComp = $elems[$elemIdx];
}
elsif ($prevFlag == -1) {
print "$elems[$elemIdx]";
$prevFlag = 0;
}
elsif ($prevFlag == 0) {
print " $elems[$elemIdx]";
}
}
print "\n";
Dirty, perhaps, but should run faster.
Upvotes: 1
Reputation: 4572
I never work with languages that support this but since you are using Perl ...
Go here .. and see this section....
Useful Example: Checking for Doubled Words
When editing text, doubled words such as "the the" easily creep in. Using the regex \b(\w+)\s+\1\b in your text editor, you can easily find them. To delete the second word, simply type in \1 as the replacement text and click the Replace button.
Upvotes: 1
Reputation: 754090
This works for me (MacOS X 10.6.7, Perl 5.13.4):
use strict;
use warnings;
my $input = "comp name1 comp name2 comp name2 comp name3" ;
my $output = "comp name1 comp name2 comp name3" ;
my $result = $input;
$result =~ s/(.*)\1/$1/g;
print "In: <<$input>>\n";
print "Want: <<$output>>\n";
print "Got: <<$result>>\n";
The key point is the '\1' in the matching.
Upvotes: 3
Reputation: 46408
s/(.*)\1/$1/g
Be warned that the running time of this regular expression is quadratic in the length of the string.
Upvotes: 8