Reputation: 1264
I've not fond a Regex sample working in Java to remove all duplicated characters.
This code does not work well : 'g' and '<' are removed, more than two characters are not clearly reduced to one, and '454' is reducd to '5'.
s = "aa hgjii2222 22 FFonn;;;,,1111111111 22< 454";
p = Pattern.compile("(.)(.)");
m = p.matcher(s);
System.out.println(m.replaceAll("$1") );
Output :
a hji222 Fon;,11111 2 5
I've tried other solutions with less success.
Upvotes: 3
Views: 208
Reputation: 454960
You can do:
String s= "aa hgjii2222 22 FFonn;;;,,1111111111 22< 454";
s = s.replaceAll("(.)\\1+","$1");
The regex used is: (.)\\1+
(.) - Matches any non-newline character and remembers it
\\1+ - One or more repetitions of the remembered character
Upvotes: 5
Reputation: 103777
That pattern doesn't do what you hope at all.
It finds any character, followed by any character (not necessarily the same as the first one), and then replaces this two character string with the first match (the first character).
In other words, it deletes every other character.
I don't think that a regex is the right tool for the job you're looking for; think about how this could be implemented as a FSA and it should be clear that regular languages don't describe the problem well at all.
It would be much simpler, and arguably clearer, to simply do this in code. Keep a set of all the characters you've encountered so far, and remove any chars that match as you iterate - something like:
final Set<Character> charsSeen = new HashSet<Character>();
final StringBuilder out = new StringBuilder();
for (char c : s.toCharArray()) {
if (!charsSeen.contains(c)) {
out.append(c);
charsSeen.add(c);
}
}
return out.toString();
Upvotes: 0
Reputation: 64563
Use
"(.)\\1+"
instead.
The first symbol is repeated one or more times.
Upvotes: 2