Reputation: 171
I have a dataframe in Spark with below data
{ID:"1",CNT:"2", Age:"21", Class:"3"}
{ID:"2",CNT:"3", Age:"24", Class:"5"}
I want to iterate over the Data frame based on the CNT value and generate output like below :
{ID:"1",CNT:"1", Age:"21", Class:"3"}
{ID:"1",CNT:"2", Age:"21", Class:"3"}
{ID:"2",CNT:"1", Age:"24", Class:"5"}
{ID:"2",CNT:"2", Age:"24", Class:"5"}
{ID:"2",CNT:"3", Age:"24", Class:"5"}
Can some one have any idea how to achieve this.
Upvotes: 1
Views: 2190
Reputation: 2506
Just in case you prefer a solution using dataframe only, here we go:
case class Person(ID: Int, CNT: Int, Age: Int, Class: Int)
val iterations: (Int => Array[Int]) = (input: Int) => {
(1 to input).toArray[Int]
}
val udf_iterations = udf(iterations)
val p1 = Person(1, 2, 21, 3)
val p2 = Person(2, 3, 24, 5)
val records = Seq(p1, p2)
val df = spark.createDataFrame(records)
df.withColumn("CNT-NEW", explode(udf_iterations(col("CNT"))))
.drop(col("CNT"))
.withColumnRenamed("CNT-NEW", "CNT")
.select(df.columns.map(col): _*)
.show(false)
+---+---+---+-----+
|ID |CNT|Age|Class|
+---+---+---+-----+
|1 |1 |21 |3 |
|1 |2 |21 |3 |
|2 |1 |24 |5 |
|2 |2 |24 |5 |
|2 |3 |24 |5 |
+---+---+---+-----+
Upvotes: 2
Reputation: 215127
You can convert the data frame to rdd
, use flatMap
to expand it and then convert it back to data frame:
val df = Seq((1,2,21,3),(2,3,24,5)).toDF("ID", "CNT", "Age", "Class")
case class Person(ID: Int, CNT: Int, Age: Int, Class: Int)
df.as[Person].rdd.flatMap(p => (1 to p.CNT).map(Person(p.ID, _, p.Age, p.Class))).toDF.show
+---+---+---+-----+
| ID|CNT|Age|Class|
+---+---+---+-----+
| 1| 1| 21| 3|
| 1| 2| 21| 3|
| 2| 1| 24| 5|
| 2| 2| 24| 5|
| 2| 3| 24| 5|
+---+---+---+-----+
Upvotes: 5