xoard
xoard

Reputation: 21

Spark DataFrame instance a new column

I have a dataframe, I groupByKey on my first Column to have a String Array and I would like instance a new object in new column of my dataframe.

+-----------+-----------------------------------------------------------+
|name       |Thing                                                      |
+-----------+-----------------------------------------------------------+
|253        |[a, b, c, d, e]                                            |
|095        |[f, g]                                                     |
|282        |[h, i, j]                                                  |
+-----------+-----------------------------------------------------------+

My object that I would instance have this structure :

public MyObject(String name,
               String[] Thing)

I define a caseclass to use DataFrame :

 case class Myclass(name: String, Thing: Array[String])

To achieve this goal I use an UDF function :

  def myFunction(name : String, Thing: Array[String]): MyObject= {
  return new MyObject(name , Thing)
}

My code like this :

var my_df = my_old_df.map(line=>(line(0).asInstanceOf[String],line(1).asInstanceOf[String]))
  .groupByKey()

val my_next_df : DataFrame= my_df.map(line => Myclass(line._1.toString,line._2.toArray)).toDF()

val myudf= sqlContext.udf.register("myudf", myFunction _)

val my_df_problem  = my_next_df.withColumn("Object", myudf($"name", $"Thing"))

I have instanciation problem : java.lang.UnsupportedOperationException: Schema for type Library.class is not supported

Upvotes: 1

Views: 474

Answers (1)

Mohammed Rafi
Mohammed Rafi

Reputation: 88

it seems UDF must return MyClass type.

val myudf= sqlContext.udf
      .register("myudf", (name : String, thing: Array[String]) => new MyClass(name , thing))

Upvotes: 1

Related Questions