Reputation: 91
I am new to UIMA ...
I want to connect to a database, extract data and process it using UIMA regex annotator and write back to database.
Example:
Table: emp
Name Department EmpId
AB-C Sale's 2134[3]
XYZ, Fina&nce 23423
PQ#R Marketing 234(47
To be transformed using UIMA regex annotator
Desired Output
Name Department EmpId
ABC Sales 21343
XYZ Finance 23423
PQR Marketing 23447
I have installed UIMA, ECLIPSE and relevant JDBC drivers to connect database.
Thanks in advance
Upvotes: 1
Views: 287
Reputation: 1229
There are a couple of ways to achieve this.
The simplest (not so extendable) way would be to write 3 classes (Use uimaFIT http://uima.apache.org/uimafit.html#Documentation to make coding easier) :
CollectionReader: - read in all data in objects - iterate over the objects and create JCASes from each object, you can store the primary key in an annotation.
Analysis Engine: - use the UIMA regex annotator to manipulate the JCAS's documentText
Consumer: - read the JCAS documentText and use the primary key to update the database
A better way would be to abstract the reading and writing by creating an external resource (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources) that connects to the database (provide a hasNext() and next() method - this is very convenient for use in the CollectionReader and Consumer). This has the advantage that all initialisation logic can be isolated. When using UIMAFit, you can use configuration parameter injection (http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.configurationparameters), for example to make the connection string and the search query configurable.
Use the SimplePipeline class in uimaFIT to run your pipeline: http://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.pipelines
Upvotes: 2