Reputation: 642
How do I iterate through Spark DataFrame rows and add them to a Sequence of case class objects?
DF1:
val someDF = Seq(
("202003101750", "202003101700",122),
("202003101800", "202003101700",12),
("202003101750", "202003101700",42)
).toDF("number", "word","value")
Case Class:
case class ValuePerNumber(num:String, wrd:String, defaultID:Int, size: Long=0) {}
Expected Output:
Seq(ValuePerNumber("202003101750", "202003101700",0, 122), ValuePerNumber("202003101800", "202003101700",0, 12), ValuePerNumber("202003101750", "202003101700",0, 42))
In each case I can have the defaultID as 0. I am not sure how to approach and solve this problem and would really appreciate any solution / suggestion!
I have tried the following:
val x = someDF.as[ValuePerNumber].collect()
I get the following error:
org.apache.spark.sql.AnalysisException: cannot resolve '`num`' given input columns: [number, word, value];
EDIT: Kindly upvote if the question/solution helped you in anyway, that in turn will help me in this forum.
Upvotes: 2
Views: 4924
Reputation: 23109
You can create Dataset[ValuePeerNumber]
and collect
it as Seq
val someDF = Seq(
("202003101750", "202003101700",122),
("202003101800", "202003101700",12),
("202003101750", "202003101700",42)
).toDF("number", "word","value")
val result = someDF.map(r => ValuePerNumber(r.getAs[String](0), r.getAs[String](1), r.getAs[Int](2))).collect().toSeq
You can also add column in dataframe and edit the column name to match case class that you can directly do
val x = someDF.as[ValuePerNumber].collect()
Upvotes: 2
Reputation: 10382
Number of column count & names in both DataFrame & Case Class should match to use as[ValuePerNumber]
on DataFrame directly without extracting values.
size
is not available in DataFrame, so added using withColumn
scala> :paste
// Entering paste mode (ctrl-D to finish)
val someDF = Seq(("202003101750", "202003101700",122),("202003101800", "202003101700",12),("202003101750", "202003101700",42))
.toDF("number", "word","value")
.withColumn("size",lit(0)) // Added this to match your case class columns
// Exiting paste mode, now interpreting.
someDF: org.apache.spark.sql.DataFrame = [number: string, word: string ... 2 more fields]
scala> case class ValuePerNumber(number:String, word:String, value:Int, size: Long=0) // Modified column names to match your dataframe column names.
defined class ValuePerNumber
scala> someDF.as[ValuePerNumber].show(false)
+------------+------------+-----+----+
|number |word |value|size|
+------------+------------+-----+----+
|202003101750|202003101700|122 |0 |
|202003101800|202003101700|12 |0 |
|202003101750|202003101700|42 |0 |
+------------+------------+-----+----+
scala>
Upvotes: 4
Reputation: 1758
val someDF = Seq(
("202003101750", "202003101700",122),
("202003101800", "202003101700",12),
("202003101750", "202003101700",42)
).toDF("number", "word","value")
case class ValuePerNumber(number:String, word:String, defaultID:Int, value: Long)
someDF.withColumn("defaultId", lit(0)).as[ValuePerNumber].collect.toSeq
Upvotes: 3