Create dataframe from rdd objectfile

Question

What is the method to create ddf from an RDD which is saved as objectfile. I want to load the RDD but I don't have a java object, only a structtype I want to use as schema for ddf.

I tried retrieving as Row

    val myrdd = sc.objectFile[org.apache.spark.sql.Row]("/home/bipin/"+name)

But I get

java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to org.apache.spark.sql.Row

Is there a way to do this.

Edit

From what I understand, I have to read rdd as array of objects and convert it to row. If anyone can give a method for this, it would be acceptable.

jlopezmat · Accepted Answer

If you have an Array of Object you only have to use the Row apply method for an array of Any. In code will be something like this:

val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row(x))

EDIT

you are rigth @user568109 this will create a Dataframe with only one field that will be an Array to parse the whole array you have to do this:

val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row.fromSeq(x.toSeq))

As @user568109 said there are other ways to do this:

val myrdd = sc.objectFile[Array[Object]]("/home/bipin/"+name).map(x => Row(x:_*))

No matters which one you will because both are wrappers for the same code:

  /**
   * This method can be used to construct a [[Row]] with the given values.
   */
   def apply(values: Any*): Row = new GenericRow(values.toArray)

  /**
   * This method can be used to construct a [[Row]] from a [[Seq]] of values.
   */
   def fromSeq(values: Seq[Any]): Row = new GenericRow(values.toArray)

Create dataframe from rdd objectfile

Answers (2)

Related Questions