Reputation: 21
I tested RUTA script with two different languages(English, Korean). I wanted to get same result that is splitted by word. but Korean sentence was not splitted by word.
Script : DECLARE Last1; W {-> Last1};
Document : "This is a sample."
Result : This , is , a , sample
Document : "이것은 샘플입니다."
Result :
"" (nothing)
The result that I want to get : 이것은 , 샘플입니다
the result is nothing. I want to know how can I detect non-english word as a word in Ruta.
I hope your help!!!
Upvotes: 1
Views: 58
Reputation: 21
I solved using 'split'.
Sentence{-> SPLIT(SPACE)};
(apache uima rota-core 2.6.1)
anyway, I want to know how to separate the unicode words using reserved keyword "W".
Upvotes: 1