cl-r
cl-r

Reputation: 1264

How remove all duplicated characters

I've not fond a Regex sample working in Java to remove all duplicated characters.

This code does not work well : 'g' and '<' are removed, more than two characters are not clearly reduced to one, and '454' is reducd to '5'.

s = "aa  hgjii2222 22    FFonn;;;,,1111111111 22< 454";
p = Pattern.compile("(.)(.)");
m = p.matcher(s);
System.out.println(m.replaceAll("$1") );

Output :

a hji222 Fon;,11111 2 5

I've tried other solutions with less success.

Upvotes: 3

Views: 208

Answers (3)

codaddict
codaddict

Reputation: 454960

You can do:

String s= "aa  hgjii2222 22    FFonn;;;,,1111111111 22< 454";
s = s.replaceAll("(.)\\1+","$1");

The regex used is: (.)\\1+

(.)  - Matches any non-newline character and remembers it
\\1+ - One or more repetitions of the remembered character

Upvotes: 5

Andrzej Doyle
Andrzej Doyle

Reputation: 103777

That pattern doesn't do what you hope at all.

It finds any character, followed by any character (not necessarily the same as the first one), and then replaces this two character string with the first match (the first character).

In other words, it deletes every other character.

I don't think that a regex is the right tool for the job you're looking for; think about how this could be implemented as a FSA and it should be clear that regular languages don't describe the problem well at all.

It would be much simpler, and arguably clearer, to simply do this in code. Keep a set of all the characters you've encountered so far, and remove any chars that match as you iterate - something like:

final Set<Character> charsSeen = new HashSet<Character>();
final StringBuilder out = new StringBuilder();
for (char c : s.toCharArray()) {
    if (!charsSeen.contains(c)) {
        out.append(c);
        charsSeen.add(c);
    }
}
return out.toString();

Upvotes: 0

Igor Chubin
Igor Chubin

Reputation: 64563

Use

"(.)\\1+"

instead.

The first symbol is repeated one or more times.

Upvotes: 2

Related Questions