Reputation: 49
I am trying to strip accents from Thai language word using the stripAccent
function in scala language, seems like it's not able to strip the accent.
import org.apache.commons.lang3.StringUtils.stripAccents
println("stripped string " + stripAccents("CLEกอ่ตัRงขึนในปีR"))
stripped string CLEกอ่ตัRงขึนในปีR
I am running in Intellij windows environment. It's stripping many other languages like German, Dutch etc. Has anyone faced similar issue, how did you resolve?
Upvotes: 0
Views: 347
Reputation: 32720
You can use java Normalizer
:
import java.text.Normalizer
val thaiString = "CLEกอ่ตัRงขึนในปีR"
val strippedString = Normalizer.normalize(thaiString, Normalizer.Form.NFD)
.replaceAll("[\\p{InCombiningDiacriticalMarks}\\p{IsM}]+", "")
println(strippedString)
//CLEกอตRงขนในปR
Upvotes: 1