samrat1
samrat1

Reputation: 49

stripAccents on Thai language

I am trying to strip accents from Thai language word using the stripAccent function in scala language, seems like it's not able to strip the accent.

import org.apache.commons.lang3.StringUtils.stripAccents
println("stripped string " + stripAccents("CLEกอ่ตัRงขึนในปีR"))

stripped string CLEกอ่ตัRงขึนในปีR

I am running in Intellij windows environment. It's stripping many other languages like German, Dutch etc. Has anyone faced similar issue, how did you resolve?

Upvotes: 0

Views: 347

Answers (1)

blackbishop
blackbishop

Reputation: 32720

You can use java Normalizer :

import java.text.Normalizer

val thaiString = "CLEกอ่ตัRงขึนในปีR"

val strippedString = Normalizer.normalize(thaiString, Normalizer.Form.NFD)
                    .replaceAll("[\\p{InCombiningDiacriticalMarks}\\p{IsM}]+", "")

println(strippedString)
//CLEกอตRงขนในปR

Upvotes: 1

Related Questions