Dzulfikar Fatahillah
Dzulfikar Fatahillah

Reputation: 15

Replace the first vowel of each word with a regex matching rule

I want to change each vowel of the words in a sentence to some Arabic unicode. I have input like

each vocal of letter each word in a sentence change into unicode use substring on java

with rule of substring replacement

String[][] replacements1 = {
           {"a", "\u0627"}, 
           {"i", "\u0627\u064A"},
           {"u", "\u0627\u0648"},
           {"e", "\u0627\u064A"},
           {"o", "\u0627\u0648"} }

I used whitespace split into an array with .split(" "), but that didn't work. I switched to using charAt(), but because this is more than 1 char or a string, I need to use some regex to define each index 0 for the substring replacement[][], without affecting another vowel in the word. How can I do this?

Output should be like this:

\u0627\u064Aach vocal \u0627\u0648f letter \u0627\u064Aach word \u0627\u064An \u0627 sentence change \u0627\u064Anto unicode \u0627\u0648se substring \u0627\u0648n java"

Upvotes: 1

Views: 428

Answers (1)

Bohemian
Bohemian

Reputation: 425198

Use a Matcher to find all the first vowels in each word, based on the regex "\\b([^aeiou]*)([aeiou])(\\w*)\\b" (which also captures the other parts of the word).

Use the API provided by Matcher to make it easy to build up the replaced string.

String str = "each vocal of letter each word in a sentence change into unicode use substring on java";

Map<String, String> replacements = new HashMap<String, String>() {{
    put("a", "\u0627");
    put("i", "\u0627\u064A");
    put("u", "\u0627\u0648");
    put("e", "\u0627\u064A");
    put("o", "\u0627\u0648");
}};

Pattern pattern = Pattern.compile("(?i)(.*?)\\b([^aeiou]*)([aeiou])(\\w*)\\b");
Matcher matcher = pattern.matcher(str);
StringBuffer buf = new StringBuffer();
while(matcher.find()) {
    matcher.appendReplacement(buf, "$1$2" + replacements.get(matcher.group(3)) + "$4");
}
matcher.appendTail(buf);
String replaced = buf.toString();

The above code has been tested and produces the desired result.


BTW, I removed backslashes from the replacement strings to avoid inserting arabic characters so I could see that the logic worked, as it's hard to see what's going on when printing a mix of right-to-left and left-to-right characters).

Upvotes: 1

Related Questions