VB_
VB_

Reputation: 97

Spark - Scala - saveAsHadoopFile throwing error

I would like to troubleshoot the issue but couldn't move further. Can anyone please help

import org.apache.hadoop.mapred.lib.MultipleTextOutputFormat

class KeyBasedOutput[T >: Null, V <: AnyRef] extends MultipleTextOutputFormat[T , V] {
override def generateFileNameForKeyValue(key: T, value: V, leaf: String) = {
key.toString
}
override def generateActualKey(key: T, value: V) = {
 null
}
}

val cp1 =sqlContext.sql("select * from d_prev_fact").map(t => t.mkString("\t")).map{x => val parts =      x.split("\t") 
    val partition_key = parts(3)
    val rows = parts.slice(0, parts.length).mkString("\t") 
   ("date=" + partition_key.toString, rows.toString)}

cp1.saveAsHadoopFile(FACT_CP)

I have got an error as below and not able to debug

scala> cp1.saveAsHadoopFile(FACT_CP,classOf[String],classOf[String],classOf[KeyBasedOutput[String, String]])
java.lang.RuntimeException: java.lang.NoSuchMethodException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$KeyBasedOutput.<init>()
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
    at org.apache.hadoop.mapred.JobConf.getOutputFormat(JobConf.java:709)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:742)
    at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:674)

The idea is to write the values into multiple folder based on the Key

Upvotes: 3

Views: 2576

Answers (2)

x4444
x4444

Reputation: 2172

Put KeyBasedOutput to a jar and start spark-shell --jars /path/to/the/jar

Upvotes: 1

reggert
reggert

Reputation: 742

I'm not certain, but I think type erasure combined with reflection may be causing this problem for you. Try defining a non generic subclass of KeyBasedOutput that hard codes the type parameters and use that.

class StringKeyBasedOutput extends KeyBasedOutput[String, String]

Upvotes: 0

Related Questions