Reputation: 31
I am using UIMA RUTA to annotate wide range of documents. They come from different sources and sometimes happens that the combination of characters <! is present in the middle of the document. The text after <! is annotated as MARKUP and ignored by other annotations.
Is there an option to turn of this behavior? Even if I switch off the MARKUP annotations, the text after <! is not being annotated by any other annotations.
I found the part of code that is responsible for creating most of the MARKUP annotations (DefaultSeeder in org.apache.uima.ruta.seed package), but I am not able to find which part of code is responsible for MARKUP annotations starting with <!
Thanks for any suggestions!
Upvotes: 1
Views: 45
Reputation: 3113
There are several options. Most likely, you want to configure the RutaEngine to use a different seeder, i.e. the TextSeeder instead of the DefaultSeeder. The TextSeeder does not create MARKUP annotations.
You can also change the visibility settings so that text covered by MARKUP annotations is processed normally, e.g., with ADDRETAINTYPE(MARKUP);
Upvotes: 0