Haodong Yang
Haodong Yang

Reputation: 11

How to import data using Mallet Java API

I am new to Mallet, and trying to use its CRF functionality to do Named Entity Recognition. I know there is an example showing how to import data using Java on their website, but it is dealing with plain texts (not in the training set format). Now I have the training data available in the following format (the exact format as shown on the website). First column is the word, and second column is the label.

a   O
50  AGE
year    AGE
old O
man GENDER
with    O
a   O
history O
of  O
suicide O
attempt O
experienced O
an  O
epileptic   O
seizure O
on  O
22-dec-01   DATE
.   O 
----

Note: It's not visible in the rendered output, but this seems to be tab-separated

So now I am stuck. How should I import the above data as training set using Mallet API?

I know how to do it in command line, but I would like to code JAVA so that I can add some more features using their API in the future.

Upvotes: 1

Views: 581

Answers (1)

drp
drp

Reputation: 340

You can read training instances using FileIterator or CSVIterator or ArrayIterator in mallet. You can find usage of CSVIteartor at http://mallet.cs.umass.edu/classifier-devel.php. FileIterator and ArrayIterator usage is available at http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.pipe.iterator.FileIterator and http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.pipe.iterator.Arrayiterator respectively.

You can find information on how to use CRF through Java code in mallet at http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.fst.CRF

Upvotes: 0

Related Questions