KursikS
KursikS

Reputation: 326

How to remove acute accents from string in java?

I know about this

public static String stripAccents(String s) {
    s = Normalizer.normalize(s, Normalizer.Form.NFD);
    s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
    return s;
}

but it works not the way I want. It changes the sense of text

stripAccents("йод,ëлка,wäre") //иод,елка,ware

I want to delete only acute accents

stripAccents("café") //cafe

Upvotes: 1

Views: 2502

Answers (2)

Joop Eggen
Joop Eggen

Reputation: 109623

Just for the acute accents:

s = Normalizer.normalize(s, Normalizer.Form.NFD); // Decompose
s = s.replace("\u0301", ""); // Combining acute accent (´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // Compose again

Composing being the shortest, and often better represented in fonts.

This removes the zero length acute accents, even without regex.

For Italian cafè, accent grave, use \u0300.

Upvotes: 2

Nowhere Man
Nowhere Man

Reputation: 19575

It seems that it's better to just remap the specific set of accented characters with acute accent into plain letters:

public static String stripAccents(String s) {
   
    if (null == s || s.isEmpty()) {
        return s;
    }
    
    final String[] map = {
        "ÁÉÍÓÚÝáéíóúý",
        "AEIOUYaeiouy"
    };
    
    return s.chars()
            .mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
            .collect(Collector.of(
                StringBuilder::new, StringBuilder::append, 
                StringBuilder::append, StringBuilder::toString
            ));
}

// or using updated switch statement in JDK 12
public static String stripAcuteAccents(String s) {
    if (null == s || s.isEmpty()) {
        return s;
    }
    char[] raw = s.toCharArray();
    for (int i = 0; i < raw.length; i++) {
        raw[i] = switch(raw[i]) {
            case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
            case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y'; 
            case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
            case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
            default -> raw[i];
        };
    }
    return new String(raw);
}

Basic tests:

String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
   
Arrays.stream(tests)
      .forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));

output

café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef

Upvotes: 1

Related Questions