Igneous01
Igneous01

Reputation: 739

Spark SQL - java.lang.UnsupportedOperationException: empty.init when casting column

I am getting the following error when trying to perform a cast on a column (read from a comma separated csv file with headers).

Here is the code I am using:

var df = spark.read.option("header","true").option("delimiter",",").csv("/user/sample/data")
df.withColumn("columnCast", expr("CAST(SaleAmount) AS LONG")).count

This causes the following exception to be thrown every time. I've tried different columns when casting and some throw while others do not. I've also tried the following which also throws the same exception.

df.withColumn("columnCast", expr("CAST(NULL) AS LONG")).count

java.lang.UnsupportedOperationException: empty.init at scala.collection.TraversableLike$class.init(TraversableLike.scala:451) at scala.collection.mutable.ArrayOps$ofInt.scala$collection$IndexedSeqOptimized$$super$init(ArrayOps.scala:234) at scala.collection.IndexedSeqOptimized$class.init(IndexedSeqOptimized.scala:135) at scala.collection.mutable.ArrayOps$ofInt.init(ArrayOps.scala:234) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$7$$anonfun$11.apply(FunctionRegistry.scala:565) at org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$7$$anonfun$11.apply(FunctionRegistry.scala:558) at scala.Option.getOrElse(Option.scala:121)

I have tried running this both in spark-shell and zeppelin. Spark version is 2.4.0.cloudera2 managed by Cloudera.

What is causing this behaviour? Is this intended? How do I handle this?

Upvotes: 2

Views: 2146

Answers (1)

triskelion
triskelion

Reputation: 66

You can use column's cast method to do the cast:

import spark.implicits._

val df = spark.sparkContext.parallelize(1 to 10).toDF("col1")
val casted = df.withColumn("test", lit(null).cast("string"))
               .withColumn("testCast", $"test".cast("long"))
casted.show()
casted.printSchema()

Result:

+----+----+--------+
|col1|test|testCast|
+----+----+--------+
|   1|null|    null|
|   2|null|    null|
|   3|null|    null|
|   4|null|    null|
|   5|null|    null|
|   6|null|    null|
|   7|null|    null|
|   8|null|    null|
|   9|null|    null|
|  10|null|    null|
+----+----+--------+

root
 |-- col1: integer (nullable = false)
 |-- test: string (nullable = true)
 |-- testCast: long (nullable = true)

Upvotes: 3

Related Questions