Reputation: 11
I am new to Mallet, and trying to use its CRF functionality to do Named Entity Recognition. I know there is an example showing how to import data using Java on their website, but it is dealing with plain texts (not in the training set format). Now I have the training data available in the following format (the exact format as shown on the website). First column is the word, and second column is the label.
a O
50 AGE
year AGE
old O
man GENDER
with O
a O
history O
of O
suicide O
attempt O
experienced O
an O
epileptic O
seizure O
on O
22-dec-01 DATE
. O
----
Note: It's not visible in the rendered output, but this seems to be tab-separated
So now I am stuck. How should I import the above data as training set using Mallet API?
I know how to do it in command line, but I would like to code JAVA so that I can add some more features using their API in the future.
Upvotes: 1
Views: 581
Reputation: 340
You can read training instances using FileIterator or CSVIterator or ArrayIterator in mallet. You can find usage of CSVIteartor at http://mallet.cs.umass.edu/classifier-devel.php. FileIterator and ArrayIterator usage is available at http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.pipe.iterator.FileIterator and http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.pipe.iterator.Arrayiterator respectively.
You can find information on how to use CRF through Java code in mallet at http://www.programcreek.com/java-api-examples/index.php?api=cc.mallet.fst.CRF
Upvotes: 0