Dynamic dataframe with n columns and m rows

Question

Reading data from json(dynamic schema) and i'm loading that to dataframe.

Example Dataframe:

scala> import spark.implicits._
import spark.implicits._

scala> val DF = Seq(
     (1, "ABC"),
     (2, "DEF"),
     (3, "GHIJ")
     ).toDF("id", "word")
someDF: org.apache.spark.sql.DataFrame = [number: int, word: string]

scala> DF.show
+------+-----+
|id    | word|
+------+-----+
|     1|  ABC|
|     2|  DEF|
|     3| GHIJ|
+------+-----+

Requirement: Column count and names can be anything. I want to read rows in loop to fetch each column one by one. Need to process that value in subsequent flows. Need both column name and value. I'm using scala.

Python:
for i, j in df.iterrows(): 
    print(i, j)

Need the same functionality in scala and it column name and value should be fetched separtely.

Kindly help.

Raphael Roth · Accepted Answer

df.iterrows is not from pyspark, but from pandas. In Spark, you can use foreach :

DF
  .foreach{_ match {case Row(id:Int,word:String) => println(id,word)}}

Result :

(2,DEF)
(3,GHIJ)
(1,ABC)

I you don't know the number of columns, you cannot use unapply on Row, then just do :

DF
  .foreach(row => println(row))

Result :

[1,ABC]
[2,DEF]
[3,GHIJ]

And operate with row using its methods getAs etc

Dynamic dataframe with n columns and m rows

Answers (1)

Related Questions