Reputation: 15
I want to change each vowel of the words in a sentence to some Arabic unicode. I have input like
each vocal of letter each word in a sentence change into unicode use substring on java
with rule of substring replacement
String[][] replacements1 = {
{"a", "\u0627"},
{"i", "\u0627\u064A"},
{"u", "\u0627\u0648"},
{"e", "\u0627\u064A"},
{"o", "\u0627\u0648"} }
I used whitespace split into an array with .split(" ")
, but that didn't work. I switched to using charAt()
, but because this is more than 1 char or a string, I need to use some regex to define each index 0 for the substring replacement[][]
, without affecting another vowel in the word. How can I do this?
Output should be like this:
\u0627\u064Aach vocal \u0627\u0648f letter \u0627\u064Aach word \u0627\u064An \u0627 sentence change \u0627\u064Anto unicode \u0627\u0648se substring \u0627\u0648n java"
Upvotes: 1
Views: 428
Reputation: 425198
Use a Matcher
to find all the first vowels in each word, based on the regex "\\b([^aeiou]*)([aeiou])(\\w*)\\b"
(which also captures the other parts of the word).
Use the API provided by Matcher
to make it easy to build up the replaced string.
String str = "each vocal of letter each word in a sentence change into unicode use substring on java";
Map<String, String> replacements = new HashMap<String, String>() {{
put("a", "\u0627");
put("i", "\u0627\u064A");
put("u", "\u0627\u0648");
put("e", "\u0627\u064A");
put("o", "\u0627\u0648");
}};
Pattern pattern = Pattern.compile("(?i)(.*?)\\b([^aeiou]*)([aeiou])(\\w*)\\b");
Matcher matcher = pattern.matcher(str);
StringBuffer buf = new StringBuffer();
while(matcher.find()) {
matcher.appendReplacement(buf, "$1$2" + replacements.get(matcher.group(3)) + "$4");
}
matcher.appendTail(buf);
String replaced = buf.toString();
The above code has been tested and produces the desired result.
BTW, I removed backslashes from the replacement strings to avoid inserting arabic characters so I could see that the logic worked, as it's hard to see what's going on when printing a mix of right-to-left and left-to-right characters).
Upvotes: 1