Reputation: 584
I need a regex to remove duplicate characters from a string like so:
abcdeafghid
to bcefghi
, removing a
and d
I have no idea how I would go about this honestly. I can find a lot about removing duplicates, but they always leave behind one instance of the duplicated character.
The order of the characters at the end doesn't matter, but since I'm working with CJK languages it should support those. How would I go about this?
Upvotes: 0
Views: 376
Reputation: 103744
You can do this one of two ways:
One line in Python:
>>> s='abcdeafghid'
>>> ''.join(c for c in s if s.count(c)==1) # only keep the ones that are singular
'bcefghi'
Or, filter out the ones that have more than one:
>>> s='abcdeafghid'
>>> filter(lambda c: s.count(c)==1, s)
'bcefghi'
In Perl you would create a hash with the counts of each character in the string then either delete the character with a regex or add in if the count is equal to 1:
my $s='abcdeafghid';
my %h;
$h{$_}++ foreach (split //, $s); # count of all characters
while(my($k, $v) = each %h){ $s =~ s/$k//g if $v>1}
print $s;
Or,
my $s='abcdeafghid';
my %h;
my $new='';
$h{$_}++ foreach (split //, $s);
foreach (split //, $s) { $new.=$_ if $h{$_}==1}
print $new;
Upvotes: 2
Reputation: 4974
Irrelevant of you language you can use the pseudocode below:
Dictionary dict
for i = 0 to Len(your_string)
if Not(dict.Exits(your_string[i])) then
dict.Add(your_string[i],1)
else
dict[your_string[i]] += 1
end if
Next i
int index = 0
while 1
if dict[your_string[index]] > 1 then
your_string = replace(your_string, your_string[index],"")
index = 0
else
index +=1
if index >= Len(your_string) then break
end if
end while
Upvotes: 1