Reputation: 21
I have a dataframe, I groupByKey on my first Column to have a String Array and I would like instance a new object in new column of my dataframe.
+-----------+-----------------------------------------------------------+
|name |Thing |
+-----------+-----------------------------------------------------------+
|253 |[a, b, c, d, e] |
|095 |[f, g] |
|282 |[h, i, j] |
+-----------+-----------------------------------------------------------+
My object that I would instance have this structure :
public MyObject(String name,
String[] Thing)
I define a caseclass to use DataFrame :
case class Myclass(name: String, Thing: Array[String])
To achieve this goal I use an UDF function :
def myFunction(name : String, Thing: Array[String]): MyObject= {
return new MyObject(name , Thing)
}
My code like this :
var my_df = my_old_df.map(line=>(line(0).asInstanceOf[String],line(1).asInstanceOf[String]))
.groupByKey()
val my_next_df : DataFrame= my_df.map(line => Myclass(line._1.toString,line._2.toArray)).toDF()
val myudf= sqlContext.udf.register("myudf", myFunction _)
val my_df_problem = my_next_df.withColumn("Object", myudf($"name", $"Thing"))
I have instanciation problem : java.lang.UnsupportedOperationException: Schema for type Library.class is not supported
Upvotes: 1
Views: 474
Reputation: 88
it seems UDF must return MyClass
type.
val myudf= sqlContext.udf
.register("myudf", (name : String, thing: Array[String]) => new MyClass(name , thing))
Upvotes: 1