Spark dataframe: Schema for type Unit is not supported

Question

I am using Spark 1.5.0 and I have this issue:

val df = paired_rdd.reduceByKey { 
    case (val1, val2) => val1 + "|" + val2 
}.toDF("user_id","description")

Here is sample data for df, as you can see the column description has this format (text1#text3#weight | text1#text3#weight|....)

user1

book1#author1#0.07841217886795074|tool1#desc1#0.27044260397331488|song1#album1#-0.052661673730870676|item1#category1#-0.005683148395350108

I want to sort this df based on weight in descending order here is what I tried:

First split the contents at "|" and then for each of those strings, split them at "#" and get the 3rd string which is weight and then convert that into a double value

val getSplitAtWeight = udf((str: String) => { 
    str.split("|").foreach(_.split("#")(2).toDouble)
})

Sort based on the weigh value returned by the udf (in descending manner)

val df_sorted = df.sort(getSplitAtWeight(col("description")).desc)

I get the following error:

Exception in thread "main" java.lang.UnsupportedOperationException: Schema for type Unit is not supported at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:153) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29) at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:64) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:29) at org.apache.spark.sql.functions$.udf(functions.scala:2242)

Spark dataframe: Schema for type Unit is not supported

Answers (1)

Related Questions