Krishna Das
Krishna Das

Reputation: 91

UIMA for structured data

I am new to UIMA ...

I want to connect to a database, extract data and process it using UIMA regex annotator and write back to database.

Example:
Table: emp

Name       Department      EmpId  
AB-C       Sale's          2134[3]  
XYZ,       Fina&nce        23423  
PQ#R       Marketing       234(47  

To be transformed using UIMA regex annotator

Desired Output

Name       Department      EmpId  
ABC        Sales           21343  
XYZ        Finance         23423  
PQR        Marketing       23447  

I have installed UIMA, ECLIPSE and relevant JDBC drivers to connect database.

Thanks in advance

Upvotes: 1

Views: 287

Answers (1)

jvdbogae
jvdbogae

Reputation: 1229

There are a couple of ways to achieve this.

The simplest (not so extendable) way would be to write 3 classes (Use uimaFIT http://uima.apache.org/uimafit.html#Documentation to make coding easier) :

CollectionReader: - read in all data in objects - iterate over the objects and create JCASes from each object, you can store the primary key in an annotation.

Analysis Engine: - use the UIMA regex annotator to manipulate the JCAS's documentText

Consumer: - read the JCAS documentText and use the primary key to update the database

A better way would be to abstract the reading and writing by creating an external resource (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources) that connects to the database (provide a hasNext() and next() method - this is very convenient for use in the CollectionReader and Consumer). This has the advantage that all initialisation logic can be isolated. When using UIMAFit, you can use configuration parameter injection (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.configurationparameters), for example to make the connection string and the search query configurable.

Use the SimplePipeline class in uimaFIT to run your pipeline: http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.pipelines

Upvotes: 2

Related Questions