Cdr
Cdr

Reputation: 581

Dynamic dataframe with n columns and m rows

Reading data from json(dynamic schema) and i'm loading that to dataframe.

Example Dataframe:

scala> import spark.implicits._
import spark.implicits._

scala> val DF = Seq(
     (1, "ABC"),
     (2, "DEF"),
     (3, "GHIJ")
     ).toDF("id", "word")
someDF: org.apache.spark.sql.DataFrame = [number: int, word: string]

scala> DF.show
+------+-----+
|id    | word|
+------+-----+
|     1|  ABC|
|     2|  DEF|
|     3| GHIJ|
+------+-----+

Requirement: Column count and names can be anything. I want to read rows in loop to fetch each column one by one. Need to process that value in subsequent flows. Need both column name and value. I'm using scala.

Python:
for i, j in df.iterrows(): 
    print(i, j) 

Need the same functionality in scala and it column name and value should be fetched separtely.

Kindly help.

Upvotes: 0

Views: 362

Answers (1)

Raphael Roth
Raphael Roth

Reputation: 27373

df.iterrows is not from pyspark, but from pandas. In Spark, you can use foreach :

DF
  .foreach{_ match {case Row(id:Int,word:String) => println(id,word)}}

Result :

(2,DEF)
(3,GHIJ)
(1,ABC)

I you don't know the number of columns, you cannot use unapply on Row, then just do :

DF
  .foreach(row => println(row))

Result :

[1,ABC]
[2,DEF]
[3,GHIJ]

And operate with row using its methods getAs etc

Upvotes: 2

Related Questions