Reputation: 646
I am trying to create a Pig UDF that extracts the locations mentioned in a tweet using the Stanford CoreNLP package interfaced through the sista Scala API. It works fine when run locally with 'sbt run', but throws a "java.lang.NoSuchMethodError" exception when called from Pig:
Loading default properties from tagger edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz 2013-06-14 10:47:54,952 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce done [7.5 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... 2013-06-14 10:48:02,108 [Low Memory Detector] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call - Collection threshold init = 18546688(18112K) used = 358671232(350264K) committed = 366542848(357952K) max = 699072512(682688K) done [5.0 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... 2013-06-14 10:48:10,522 [Low Memory Detector] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call- Usage threshold init = 18546688(18112K) used = 590012928(576184K) committed = 597786624(583776K) max = 699072512(682688K) done [5.6 sec]. 2013-06-14 10:48:11,469 [Thread-11] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.NoSuchMethodError: org.joda.time.Duration.compareTo(Lorg/joda/time/ReadableDuration;)I at edu.stanford.nlp.time.SUTime$Duration.compareTo(SUTime.java:3406) at edu.stanford.nlp.time.SUTime$Duration.max(SUTime.java:3488) at edu.stanford.nlp.time.SUTime$Time.difference(SUTime.java:1308) at edu.stanford.nlp.time.SUTime$Range.(SUTime.java:3793) at edu.stanford.nlp.time.SUTime.(SUTime.java:570)
Here is the relevant code:
object CountryTokenizer {
def tokenize(text: String): String = {
val locations = TweetEntityExtractor.NERLocationFilter(text)
println(locations)
locations.map(x => Cities.country(x)).flatten.mkString(" ")
}
}
class PigCountryTokenizer extends EvalFunc[String] {
override def exec(tuple: Tuple): java.lang.String = {
val text: java.lang.String = Util.cast[java.lang.String](tuple.get(0))
CountryTokenizer.tokenize(text)
}
}
object TweetEntityExtractor {
val processor:Processor = new CoreNLPProcessor()
def NERLocationFilter(text: String): List[String] = {
val doc = processor.mkDocument(text)
processor.tagPartsOfSpeech(doc)
processor.lemmatize(doc)
processor.recognizeNamedEntities(doc)
val locations = doc.sentences.map(sentence => {
val entities = sentence.entities.map(List.fromArray(_)) match {
case Some(l) => l
case _ => List()
}
val words = List.fromArray(sentence.words)
(words zip entities).filter(x => {
x._1 != "" && x._2 == "LOCATION"
}).map(_._1)
})
List.fromArray(locations).flatten
}
}
I am using sbt-assembly to construct a fat-jar, and so the joda-time jar file should be accessible. What is going on?
Upvotes: 0
Views: 619
Reputation: 646
Pig ships with its own version of joda-time (1.6), which is incompatible with 2.x.
Upvotes: 0