Apache spark Row getAs[String] : java.lang.Byte cannot be cast to java.lang.String

Question

I have a Spark Dataframe, which looks like this:

+-----------+-----+
|foo        |  bar|
+-----------+-----+
|          3|10119|
|          2| 4305|
+-----------+-----+

And it has the following schema

org.apache.spark.sql.types.StructType = StructType(
    StructField(foo,ByteType,true), 
    StructField(bar,LongType,false)
)

As you can see, the column foo is of ByteType.

I need to get the first row of foo as a String.

When I try

val fooStr = df.first.getAs[String](0)

I get cast exception :

java.lang.ClassCastException: java.lang.Byte cannot be cast to java.lang.String

However when I use toString, I am able to cast

val myStr = df.first.get(0).toString

Why is it that when I use Row.getAs[String] I get a casting exception , but when I use toString, there is no error. Is there any drawback to using toString?

werner · Accepted Answer

Row.getAs[T](i) is here defined as

def getAs[T](i: Int): T = get(i).asInstanceOf[T]

asInstanceOf[T] simply tries to cast the object to the desired type (see here) without any further transformations. If the type returned by get(i) and the desired type are not compatible (like Byte and String) a ClassCastException is thrown.

Calling toString on the return value of get(0) means however that Byte.toString() is called. This is not a cast but a regular method call that returns a String.

Apache spark Row getAs[String] : java.lang.Byte cannot be cast to java.lang.String

Answers (1)

Related Questions