Remove duplicated characters from String using regex keeping first occurances

Question

I know how to remove duplicated characters from a String and keeping the first occurrences without regex:

String method(String s){
  String result = "";
  for(char c : s.toCharArray()){
    result += result.contains(c+"")
     ? ""
     : c;
  }
  return result;
}

// Example input: "Type unique chars!"
// Output:        "Type uniqchars!"

I know how to remove duplicated characters from a String and keeping the last occurrences with regex:

String method(String s){
  return s.replaceAll("(.)(?=.*\1)", "");
}

// Example input: "Type unique chars!"
// Output:        "Typnique chars!"

As for my question: Is it possible, with a regex, to remove duplicated characters from a String, but keep the first occurrences instead of the last?

As for why I'm asking: I came across this codegolf answer using the following function (based on the first example above):

String f(char[]s){String t="";for(char c:s)t+=t.contains(c+"")?"":c;return t;}

and I was wondering if this can be done shorter with a regex and String input. But even if it's longer, I'm just curious in general if it's possible to remove duplicated characters from a String with a regex, while keeping the first occurrences of each character.

Wiktor Stribiżew · Accepted Answer

It is not the shortest option, and does not only involve a regex, but still an option. You may reverse the string before running the regex you have and then reverse the result back.

public static String g(StringBuilder s){
  return new StringBuilder(
   s.reverse().toString()
     .replaceAll("(?s)(.)(?=.*\1)", ""))
     .reverse().toString();
}

See the online Java demo

Note I suggest adding (?s) (= Pattern.DOTALL inline modifier flag) to the regex so as . could match any symbol including a newline (a . does not match all line breaks by default).

Remove duplicated characters from String using regex keeping first occurances

Answers (1)

Related Questions