Hyunmin Lee
Hyunmin Lee

Reputation: 21

apache uima ruta - non english sentence processing

I tested RUTA script with two different languages(English, Korean). I wanted to get same result that is splitted by word. but Korean sentence was not splitted by word.

Script : DECLARE Last1; W {-> Last1};

Document : "This is a sample."

Result : This , is , a , sample

Document : "이것은 샘플입니다."
Result : "" (nothing)

The result that I want to get : 이것은 , 샘플입니다

the result is nothing. I want to know how can I detect non-english word as a word in Ruta.

I hope your help!!!

Upvotes: 1

Views: 58

Answers (1)

Hyunmin Lee
Hyunmin Lee

Reputation: 21

I solved using 'split'.

Sentence{-> SPLIT(SPACE)};

(apache uima rota-core 2.6.1)

anyway, I want to know how to separate the unicode words using reserved keyword "W".

Upvotes: 1

Related Questions