Sugunalakshmi Pagemajik
Sugunalakshmi Pagemajik

Reputation: 1054

How to retrieve compound words from string list- UIMA RUTA

Sample Script:

DECLARE Name,TEST;

 "Peter"->Name;
 "der Groot"->Name;
 "Robert"->Name;
 "de Leew"->Name;
 "O'Sullivan"->Name;

STRING s;
STRINGLIST slist;
Name{-> MATCHEDTEXT(s), ADD(slist,s),LOG(s)};
  ANY+ {INLIST(slist)->MARK(TEST)};

Received Output:

Peter
Robert

Expected Output:

 Peter
 der Groot
 Robert
 de Leew
 O'Sullivan

Sample Input:

Peter
der Groot
Robert
de Leew
O'Sullivan

I've tried to mark the stringlist value into an annotation type.But the received output is different from expected output.

Upvotes: 2

Views: 134

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

The condition at the rule element ANY+ validates every single ANY, thus fails with the first one and also matches only single tokens.

Should the last rule annotate only position directly after Name annotations?

If not, the you can do something like:

Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);

If yes, the situation gets more complicated because you do not have candidates with the correct span. You cannot solve this with a combination of ANY and INLIST, You either need a correct span or fragments in the list. I'd rather recommend an additional fixing rule:

Name{-> MATCHEDTEXT(s), ADD(slist,s)};
MARKFAST(TEST, slist);
ANY{-ENDSWITH(Name)} @TEST{-> UNMARK(TEST)};

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 2

Related Questions