Arnold Angel
Arnold Angel

Reputation: 47

Stanford CoreNLP Training Examples

Anyone know where the following files located:

trainFileList = /u/nlp/data/ner/column_data/muc6.ptb.train, /u/nlp/data/ner/column_data/muc7.ptb.train

I am following the FAQ link http://nlp.stanford.edu/software/crf-faq.shtml#a

If all I need to do is provide a file with two columns consisting of tokens and class, then that will work. But I am curious about the train files listed in the classifier property files.

serializeTo = english.muc.7class.caseless.distsim.crf.ser.gz

java -mx1g -cp "$CLASSPATH" edu.stanford.nlp.ie.NERClassifierCombiner -textFile sample.txt -ner.model classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz -outputFormat tabbedEntities -textFile sample.txt > sample2.tsv

Upvotes: 1

Views: 1183

Answers (1)

StanfordNLPHelp
StanfordNLPHelp

Reputation: 8739

Those files are the training data for the MUC-6 and MUC-7 tasks:

http://cs.nyu.edu/faculty/grishman/muc6.html

They are not distributed by Stanford. I will see if I can figure out where they are distributed and update this answer.

UPDATE: LDC distributes those files if you want to get a copy, they have copyright issues so you have to purchase them from LDC, that is why we don't distribute them. Here are some links with more info:

http://www-nlpir.nist.gov/related_projects/muc/muc_data/muc_data_index.html

https://catalog.ldc.upenn.edu/LDC2003T13

https://catalog.ldc.upenn.edu/LDC2001T02

Upvotes: 1

Related Questions