Reputation: 6522
I have a string that contains a few words. I want to find out all the words that contain only characters of Tamil Unicode. I am new to javascript.
Using Go, I do the same like:
tokens := strings.Fields(stringContent, delim) // split based on delim, say space
for _, token := range tokens { //like foreach
r, l := utf8.DecodeRuneInString(token)
if l != 1 {
if unicode.Is(unicode.Tamil, r) {
// Tamil word
}
}
}
I found that string.split() will give me the individual words based on the delimiter, in javascript. But I am not able to find out how to get if the word is a UTF-8 TAMIL word. Can someone help me achieve this in javascript ?
Upvotes: 4
Views: 6518
Reputation: 25135
Easy way is to do a regular expression match for words having characters in a unicode range
Hope this helps : http://kourge.net/projects/regexp-unicode-block
A sample with which you can start
"இந்தியா ASASAS எறத்தாழ ASSASAS குடியரசு ASWED SAASAS".match(/[\u0B80-\u0BFF]+/g);
Upvotes: 10