D Yogendra Rao
D Yogendra Rao

Reputation: 59

Search for an item in a text file using UIMA Ruta

I have been trying to search for an item which is there in a text file.

The text file is like Eg: `

>HEADING

00345

XYZ

MethodName : fdsafk

Date: 23-4-2012

More text and some part containing instances of XYZ`

So I did a dictionary search for XYZ initially and found the positions, but I want only the 1st XYZ and not the rest. There is a property of XYZ that , it will always be between the 5 digit code and the text MethondName .

I am unable to do that.

WORDLIST ZipList = 'Zipcode.txt';
DECLARE Zip;
Document
Document{-> MARKFAST(Zip, ZipList)};

DECLARE Method;
"MethodName" -> Method;


WORDLIST typelist = 'typelist.txt';
DECLARE type;
Document{-> MARKFAST(type, typelist)};

Also how do we use REGEX in UIMA RUTA?

Upvotes: 1

Views: 174

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

There are many ways to specify this. Here are some examples (not tested):

// just remove the other annotations (assuming type is the one you want)
type{-> UNMARK(type)} ANY{-STARTSWITH(Method)};

// only keep the first one: remove any annotation if there is one somewhere in front of it
// you can also specify this with POSISTION or CURRENTCOUNT, but both are slow
type # @type{-> UNMARK(type)}

// just create a new annotation in between
NUM{REGEXP(".....")} #{-> type} @Method;

There are two options to use regex in UIMA Ruta:

  • (find) simple regex rules like "[A-Za-z]+" -> Type;
  • (matches) REGEXP conditions for validating the match of a rule element like
    ANY{REGEXP("[A-Za-z]+")-> Type};

Let me know if something is not clear. I will extend the description then.

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 1

Related Questions