CyKon
CyKon

Reputation: 153

UIMA Ruta - basic example

I am trying the example of uima ruta: here.

I want to create ruta script and apply it to my text (from plain java without any workbench).

1.how do i get the type system descriptor from plain java (without workbench)? 2. when do i get it with workbench? (if i "run" the ruta script, no description were made.)

Upvotes: 1

Views: 1920

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

The main question is whether the script declares new types.

If no new types are declared, the linked examples in the documentation should be sufficient.

If new types are declared in the script, then a type system description needs to be created and included in the creation process of the CAS before the script can be applied on the CAS.

The type system description of a script containing the type descriptions of the types declared within the script can be created the following ways:

  • The Ruta Workbench creates the type system description automatically for each script within a simple Ruta Project when the script is saved. If no description is created, the script is most likely not parseable and contains syntax errors.
  • In maven-built projects, the ruta-maven-plugin can be utilized to create the type system descriptions of Ruta scripts.
  • In plain Java, the RutaDescriptorFactory can be utilized to create the type system description programmatically. Here's a code example.

There are several ways to create and execute a ruta-based analysis engine in plain java code. Here's an example without using additional files:

String rutaScript = "DECLARE MyType; CW{-> MyType};";

RutaDescriptorFactory descriptorFactory = new RutaDescriptorFactory();
RutaBuildOptions options = new RutaBuildOptions();
options.setResolveImports(true);
options.setImportByName(true);
RutaDescriptorInformation descriptorInformation = descriptorFactory
        .parseDescriptorInformation(rutaScript, options);
// replace null values for build environment if necessary (e.g., location in classpath)
Pair<AnalysisEngineDescription, TypeSystemDescription> descriptions = descriptorFactory
        .createDescriptions(null, null, descriptorInformation, options, null, null, null);

AnalysisEngineDescription rutaAnalysisEngineDescription = descriptions.getKey();
rutaAnalysisEngineDescription.getAnalysisEngineMetaData().getConfigurationParameterSettings().setParameterValue(RutaEngine.PARAM_RULES, rutaScript);
TypeSystemDescription rutaTypeSystemDescription = descriptions.getValue();
// directly set type system description since no file will be created
rutaAnalysisEngineDescription.getAnalysisEngineMetaData().setTypeSystem(rutaTypeSystemDescription);

ResourceManager resourceManager = UIMAFramework.newDefaultResourceManager();
AnalysisEngine ae = UIMAFramework.produceAnalysisEngine(rutaAnalysisEngineDescription);

List<TypeSystemDescription> typeSystemDescriptions = new ArrayList<>();
TypeSystemDescription scannedTypeSystemDescription = TypeSystemDescriptionFactory.createTypeSystemDescription();
typeSystemDescriptions.add(scannedTypeSystemDescription);
typeSystemDescriptions.add(rutaTypeSystemDescription);
TypeSystemDescription mergeTypeSystemDescription = CasCreationUtils.mergeTypeSystems(typeSystemDescriptions, resourceManager);

JCas jCas = JCasFactory.createJCas(mergeTypeSystemDescription);
CAS cas = jCas.getCas();
jCas.setDocumentText("This is my document.");
ae.process(jCas);

Collection<AnnotationFS> select = CasUtil.select(cas, cas.getTypeSystem().getType("Anonymous.MyType"));
for (AnnotationFS each : select) {
  System.out.println(each.getCoveredText());
}

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 3

Related Questions