Reputation: 326
I know about this
public static String stripAccents(String s) {
s = Normalizer.normalize(s, Normalizer.Form.NFD);
s = s.replaceAll("[\\p{InCombiningDiacriticalMarks}]", "");
return s;
}
but it works not the way I want. It changes the sense of text
stripAccents("йод,ëлка,wäre") //иод,елка,ware
I want to delete only acute accents
stripAccents("café") //cafe
Upvotes: 1
Views: 2502
Reputation: 109623
Just for the acute accents:
s = Normalizer.normalize(s, Normalizer.Form.NFD); // Decompose
s = s.replace("\u0301", ""); // Combining acute accent (´)
s = Normalizer.normalize(s, Normalizer.Form.NFC); // Compose again
Composing being the shortest, and often better represented in fonts.
This removes the zero length acute accents, even without regex.
For Italian cafè, accent grave, use \u0300
.
Upvotes: 2
Reputation: 19575
It seems that it's better to just remap the specific set of accented characters with acute accent into plain letters:
public static String stripAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
final String[] map = {
"ÁÉÍÓÚÝáéíóúý",
"AEIOUYaeiouy"
};
return s.chars()
.mapToObj(c -> (char)(map[0].indexOf(c) > -1 ? map[1].charAt(map[0].indexOf(c)) : c))
.collect(Collector.of(
StringBuilder::new, StringBuilder::append,
StringBuilder::append, StringBuilder::toString
));
}
// or using updated switch statement in JDK 12
public static String stripAcuteAccents(String s) {
if (null == s || s.isEmpty()) {
return s;
}
char[] raw = s.toCharArray();
for (int i = 0; i < raw.length; i++) {
raw[i] = switch(raw[i]) {
case 'Á' -> 'A'; case 'É' -> 'E'; case 'Í' -> 'I';
case 'Ó' -> 'O'; case 'Ú' -> 'U'; case 'Ý' -> 'Y';
case 'á' -> 'a'; case 'é' -> 'e'; case 'í' -> 'i';
case 'ó' -> 'o'; case 'ú' -> 'u'; case 'ý' -> 'y';
default -> raw[i];
};
}
return new String(raw);
}
Basic tests:
String[] tests = {"café", "Á Toi", "ÁÉÍÓÚÝáéíóúý - bcdef"};
Arrays.stream(tests)
.forEach(s -> System.out.printf("%s -> %s%n", s, stripAccents(s)));
output
café -> cafe
Á Toi -> A Toi
ÁÉÍÓÚÝáéíóúý - bcdef -> AEIOUYaeiouy - bcdef
Upvotes: 1