Reputation: 360
Assuming the following Dataframe df1
:
df1 :
+---------+--------+-------+
|A |B |C |
+---------+--------+-------+
|toto |tata |titi |
+---------+--------+-------+
I have the N = 3
integer which I want to use in order to create 3
duplicates in the df2
Dataframe using df1
:
df2 :
+---------+--------+-------+
|A |B |C |
+---------+--------+-------+
|toto |tata |titi |
|toto |tata |titi |
|toto |tata |titi |
+---------+--------+-------+
Any ideas ?
Upvotes: 2
Views: 125
Reputation: 4045
You can use foldLeft
along with Dataframe's union
import org.apache.spark.sql.DataFrame
object JoinDataFrames {
def main(args: Array[String]): Unit = {
val spark = Constant.getSparkSess
import spark.implicits._
val df = List(("toto","tata","titi")).toDF("A","B","C")
val N = 3;
val resultDf = (1 until N).foldLeft( df)((dfInner : DataFrame, count : Int) => {
df.union(dfInner)
})
resultDf.show()
}
}
Upvotes: 1
Reputation: 31460
From Spark-2.4+
use arrays_zip + array_repeat + explode
functions for this case.
val df=Seq(("toto","tata","titi")).toDF("A","B","C")
df.withColumn("arr",explode(array_repeat(arrays_zip(array("A"),array("B"),array("c")),3))).
drop("arr").
show(false)
//or dynamic way
val cols=df.columns.map(x => col(x))
df.withColumn("arr",explode(array_repeat(arrays_zip(array(cols:_*)),3))).
drop("arr").
show(false)
//+----+----+----+
//|A |B |C |
//+----+----+----+
//|toto|tata|titi|
//|toto|tata|titi|
//|toto|tata|titi|
//+----+----+----+
Upvotes: 1