sande
sande

Reputation: 664

Scala getting java exception while trying to create RDD which has more than 255 columns

I have a huge data set with almost 600 columns but, while I am trying to create a DF it is failing with

Exception in thread "main" java.lang.ClassFormatError: Too many arguments in method signature in class file

Sample code:

def main(args: Array[String]): Unit = {
  val data = sc.textFile(file);
  val rd = data.map(line => line.split(",")).map(row => new Parent(row(0), row(1), ........row(600)))
  rd.toDF.write.mode("append").format("orc").insertInto("Table")
}

Can someone help how to perform workaround for this?.

Upvotes: 0

Views: 668

Answers (1)

Ra41P
Ra41P

Reputation: 774

I believe there is a limit on maximum method arguments for a Java object, which therefore extends to Scala object as well. A Person class with 600 params would be unfeasible.

The most ideal solution would be to read the csv natively as:

spark.read.csv(filePath)

Additionally, you may choose to increase the maxColumns option, using the signature.

spark.read.options().csv() 

While it does not directly affect your use-case, the max-columns is set to 20480. More information on these parameters can be found here.

Upvotes: 2

Related Questions