WestCoastProjects
WestCoastProjects

Reputation: 63062

Getting B cannot be cast to java.lang.String when using Spark SQL

My issue is when I'm using trying to read data from a sql.Row as a String. I'm using pyspark, but I've heard people have this issue with Scala API too.

The pyspark.sql.Row object is a pretty intransigent creature. The following exception is thrown:

java.lang.ClassCastException: [B cannot be cast to java.lang.String
 at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)

So what we have is one of the fields is being represented as a byte array. The following python printing constructs do NOT work

repr(sqlRdd.take(2))

Also

import pprint
pprint.pprint(sqlRdd.take(2))

Both result in the ClassCastException.

So.. how do other folks do this? I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect.

Upvotes: 3

Views: 4263

Answers (1)

samthebest
samthebest

Reputation: 31513

Try

sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")

I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false.

Upvotes: 4

Related Questions