tacos_tacos_tacos
tacos_tacos_tacos

Reputation: 10585

Using Ruta TextRuler outside of the Eclipse GUI

I have a few questions regarding UIMA Ruta TextRuler.

Is it possible (I assume it should be) to use the TextRuler outside of the Eclipse GUI? If so, can you provide an example of calling it outside of the GUI that includes the following parameters (as they are the same ones that are provided to the GUI for TextRuler):

Second, within the (latest release version of the) Eclipse GUI, is there any way to view the progress of the rule generation or to reset the rule generation methods running to stop them once they seem to have encountered an error? Sometimes I start it and it just stays on "MethodsRunning..." with "Loading AE" next to each analysis engine but no CPU utilization.

Third, is there any way within the Eclipse GUI to dedicate more CPU resources to the extraction process or to otherwise speed up the execution of the TextRuler generation? It seems to max out at around 25% utilization so there are plenty of CPU resources available (I wonder if it is limited to a single core).

Finally, are there any further resources on Ruta TextRuler available online outside of the Ruta documentation? I've had a hard time coming by any further material on TextRuler and would love to learn more. In particular I would like to learn more about the parameterization of each learning model.

Upvotes: 2

Views: 149

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

Unfortunately, it is not possible to use Ruta TextRuler outside of Eclipse right now (UIMA Ruta 2.3.1). There is an open issue for this, but it was never of high priority: UIMA-2705

The TextRuler framework is in general not well maintained and there is not much documentation. There is an additional section on how to get the example project up and running: TextRuler HowTo

The additional data folder is a configuration parameter for rule learning algorithms that require more data than just a gold standard. Right now, this is just the TraBaL algorithm. Here, the annotated input xmi files are located in this folder which contains erroneous annotations. The algorithm tries to learn correction rules in order to transform these annotations into gold standard annotations given in the training data folder.

The absolute and relative progress depends on the configured algorithms. There is progress information for intermediate steps of the algorithms. It should be possible to stop the execution of the algorithms in general. The behavior you describe sounds like a bug. I will take a look at it when I find the time.

It is currently not supported to parallelize a rule learning algorithm. If several algorithms are selected, they will be executed in separate threads.

The ruta documentation is the main source of information I know of. There are of course the publications of the algorithms mentioned in the documentation, which contain more detailed information about the algorithms themselves. There is also the old publication about TextRuler. The parameters are normally direct translations of the parameters of the original algorithms described in the papers.

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 1

Related Questions