Viorel Morari
Viorel Morari

Reputation: 547

Ruta in UIMA Environment. Working with predifined collections/sets and lexicons in plain Java

I'm a beginner with Ruta and the idea I'm trying to grasp now is how to handle, within UIMA environment(in plain Java), the class variables/collections. I've tried following the examples given in the documentation; but the Ruta rules are applied either externally as a script file or right "on the spot" using Ruta.apply(cas, rule). Neither of these options allows me to use, for example, a file lexicon or any predifined java collections. Could you please give me any hints/solutions to my problem?

Generally, I'm using UIMA AE's to parse sentences and then, to use the created annotations within Ruta script for matching specific types of sentences based on their syntactical structure. Therefore, the Ruta rules I write are fairly simple but bulky because of the POStags set. So I would like to get some flexibility inside Ruta. I would be grateful if there are any suggestions on this topis as well.

EDIT: For example, I have a rule which considers a set of POSTags created by an AE (Stanford Parser). So in order to match the desired sentence structure I would hardcode it in the following way(I realize it's the most naive way):

String rutaSampleRule = "BLOCK(ForEach) Sentence{}{Document{-> Asyndeton} " + "<- {((Constituent.label==\"NN\" COMMA Constituent.label==\"NN\") |" + " (Constituent.label==\"NNP\" COMMA Constituent.label==\"NNP\") |" + " (Constituent.label==\"NNPS\" COMMA Constituent.label==\"NNPS\") |" + " (Constituent.label==\"NNS\" COMMA Constituent.label==\"NNS\"));};}";
Ruta.apply(cas, rutaSampleRule);

Now, what I would like to have instead is to declare a collection of such POStags (i.e. NNS, NN), iterate over it inside Ruta and match the respective sentence structure (here, consecutive nouns). This would make my rules much more flexible and practical.

The second option would be to use lexicons instead of collection but I thought they can be used(with MARKFAST) only within Ruta separately(not plain Java); at least I could not find any examples.

So, to summarize my question: Is it possible(and how if so), within simple Ruta scripts (which do not introduce any new types), to work with externally defined collections/lexicons in plain Java?

I hope, I managed to explain it in a better way. Thanks in advance.

EDIT 1: I figured it out how to use lexicons inside plain Java just by playing around with paths and the example in the guide book. Still, I would like to know how to assign the values to variables by using the configuration parameters?

Upvotes: 2

Views: 107

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

This should do the trick (tested with current trunk):

String rutaSampleRule = "STRINGLIST posList;"
    + "Sentence{-> Asyndeton} <- {"
    + "c1:Constituent{CONTAINS(posList, c1.label)} COMMA c2:Constituent{c2.label == c1.label};"
    + "};";

List<String> posList = Arrays.asList(new String[] { "NN", "NNP", "NNPS", "NNS" });
Map<String, Object> additionalParams = new HashMap<>();
additionalParams.put(RutaEngine.PARAM_VAR_NAMES, new String[] { "posList" });
additionalParams.put(RutaEngine.PARAM_VAR_VALUES, new String[] { StringUtils.join(posList, ",") });
Ruta.apply(cas, rutaSampleRule, additionalParams);

Some comments:

  • A STRINGLIST is declared in the rules and filled by using the two config parameters.
  • I refactored the inlined rules: no disjunctive composed rule element required (several rules would do the same), no multiple rule elements/rules required.
  • A block is now not required anymore in the example. I removed it.
  • If there is some problem with released version of Ruta, rewriting of the rule is required: usage of a string variable instead of direct comparison of features of the label expressions.
  • An approach using an external dictionary would like quite similar, e.g., with an INLIST condition.

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 1

Related Questions