Magnilex
Magnilex

Reputation: 11978

Replace a character with different characters depending on which character it is

I have searched SO (and Google) but not found any fully matching answer to my question:

I want to replace all swedish characters and whitespace in a String with another character. I would like it to work as follows:

Can this be achieved with regex (or any other way), and if so, how?

Of course, the below method does the job (and can be improved, I know, by replacing for example "å" and "ä" on the same line):

private String changeSwedishCharactersAndWhitespace(String string) {
    String newString = string.replaceAll("å", "a");
    newString = string.replaceAll("ä", "a");
    newString = string.replaceAll("ö", "o");
    newString = string.replaceAll("Å", "A");
    newString = string.replaceAll("Ä", "A");
    newString = string.replaceAll("Ö", "O");
    newString = string.replaceAll(" ", "-");
    return newString;
}

I know how to use regex to replace, for example, all "å", "ä", or "ö" with "". The question is how do I replace a character using regex with another depending on which character it is? There must surely be a better way using regex than the above aproach?

Upvotes: 5

Views: 2630

Answers (4)

Juvanis
Juvanis

Reputation: 25950

I think there is not a common regex for replacing these characters at once. Apart from that, you can facilitate your replacement work by using a HashMap.

HashMap<String, String> map = new HashMap<String, String>()
                              {{put("ä", "a"); /*put others*/}};

for (Map.Entry<String, String> entry : map.entrySet())
    newString = string.replaceAll(entry.getKey(), entry.getValue());

Upvotes: 3

Rodrigo
Rodrigo

Reputation: 400

You can write your own mapper usen the matcher.find method:

public static void main(String[] args) {
    String from = "äöÂ";
    String to   = "aoA";
    String testString = "Hellö Wärld";

    Pattern p = Pattern.compile(String.format("[%s]", from));
    Matcher m = p.matcher(testString);
    String result = testString;
    while (m.find()){
        char charFound = m.group(0).charAt(0);
        result = result.replace(charFound, to.charAt(from.indexOf(charFound)));
    }

    System.out.println(result);
}

this will replace

Hellö Wärld

with

Hello Warld

Upvotes: 0

Joop Eggen
Joop Eggen

Reputation: 109567

For latin characters with diacritics, a unicode normalization (java text) to retrieve basic letter code + diacritic combining code might help. Something like:

import java.text.Normalizer;
newString = Normalizer.normalize(string,
        Normalizer.Form.NFKD).replaceAll("\\p{M}", "");

Upvotes: 6

ShyJ
ShyJ

Reputation: 4640

You can use StringUtils.replaceEach, like this:

private String changeSwedishCharactersAndWhitespace(String string) {
    String newString = StringUtils.replaceEach (string, 
      new String[] {"å", "ä", "ö", "Å", "Ä", "Ö", " "}, 
      new String[] {"a", "a", "o", "A", "A", "O", "-"});
    return newString;
}

Upvotes: 3

Related Questions