kyrre
kyrre

Reputation: 646

Piglatin jodatime error with StanfordCoreNLP

I am trying to create a Pig UDF that extracts the locations mentioned in a tweet using the Stanford CoreNLP package interfaced through the sista Scala API. It works fine when run locally with 'sbt run', but throws a "java.lang.NoSuchMethodError" exception when called from Pig:

Loading default properties from tagger edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger Reading POS tagger model from edu/stanford/nlp/models/pos-tagger/english-left3words/english-left3words-distsim.tagger Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz 2013-06-14 10:47:54,952 [communication thread] INFO org.apache.hadoop.mapred.LocalJobRunner - reduce > reduce done [7.5 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... 2013-06-14 10:48:02,108 [Low Memory Detector] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call - Collection threshold init = 18546688(18112K) used = 358671232(350264K) committed = 366542848(357952K) max = 699072512(682688K) done [5.0 sec]. Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... 2013-06-14 10:48:10,522 [Low Memory Detector] INFO org.apache.pig.impl.util.SpillableMemoryManager - first memory handler call- Usage threshold init = 18546688(18112K) used = 590012928(576184K) committed = 597786624(583776K) max = 699072512(682688K) done [5.6 sec]. 2013-06-14 10:48:11,469 [Thread-11] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.NoSuchMethodError: org.joda.time.Duration.compareTo(Lorg/joda/time/ReadableDuration;)I at edu.stanford.nlp.time.SUTime$Duration.compareTo(SUTime.java:3406) at edu.stanford.nlp.time.SUTime$Duration.max(SUTime.java:3488) at edu.stanford.nlp.time.SUTime$Time.difference(SUTime.java:1308) at edu.stanford.nlp.time.SUTime$Range.(SUTime.java:3793) at edu.stanford.nlp.time.SUTime.(SUTime.java:570)

Here is the relevant code:

object CountryTokenizer {
  def tokenize(text: String): String = {
    val locations = TweetEntityExtractor.NERLocationFilter(text)
    println(locations)
    locations.map(x => Cities.country(x)).flatten.mkString(" ")
  }
}

class PigCountryTokenizer extends EvalFunc[String] {
  override def exec(tuple: Tuple): java.lang.String = {
    val text: java.lang.String = Util.cast[java.lang.String](tuple.get(0))
    CountryTokenizer.tokenize(text)
  }
}

object TweetEntityExtractor {
    val processor:Processor = new CoreNLPProcessor()


    def NERLocationFilter(text: String): List[String] =  {
        val doc = processor.mkDocument(text)

        processor.tagPartsOfSpeech(doc)
        processor.lemmatize(doc)
        processor.recognizeNamedEntities(doc)

        val locations = doc.sentences.map(sentence => {
            val entities = sentence.entities.map(List.fromArray(_)) match {
                case Some(l) => l
                case _ => List()
            }
            val words = List.fromArray(sentence.words)

            (words zip entities).filter(x => {
                x._1 != "" && x._2 == "LOCATION" 
            }).map(_._1)
        })
        List.fromArray(locations).flatten
    }
}

I am using sbt-assembly to construct a fat-jar, and so the joda-time jar file should be accessible. What is going on?

Upvotes: 0

Views: 619

Answers (1)

kyrre
kyrre

Reputation: 646

Pig ships with its own version of joda-time (1.6), which is incompatible with 2.x.

Upvotes: 0

Related Questions