Ganesha
Ganesha

Reputation: 145

Partial Replication of DataFrame rows

I have a Dataframe which has the following structure and data

Source:

Column1(String), Column2(String), Date
-----------------------
1, 2, 01/01/2021
A, B, 02/01/2021 
M, N, 05/01/2021

I want to transform it to the following (First 2 columns are replicated in values and date is incremented until a fixed date (until 07/01/2021 in this example) for each of the source row)

To Result:

1, 2, 01/01/2021
1, 2, 02/01/2021
1, 2, 03/01/2021
1, 2, 04/01/2021
1, 2, 05/01/2021
1, 2, 06/01/2021
1, 2, 07/01/2021
A, B, 02/01/2021 
A, B, 03/01/2021 
A, B, 04/01/2021 
A, B, 05/01/2021 
A, B, 06/01/2021 
A, B, 07/01/2021 
M, N, 05/01/2021
M, N, 06/01/2021
M, N, 07/01/2021

Any idea on how this can be achieved in scala spark?

I got this link Replicate Spark Row N-times, but there is no hint on how a particular column can be incremented during replication.

Upvotes: 1

Views: 47

Answers (1)

Mohana B C
Mohana B C

Reputation: 5487

We can use sequence function to generate list of dates in required range, then explode the output array of sequence function to get dataframe in required format.

val spark = SparkSession.builder().master("local[*]").getOrCreate()
import spark.implicits._
spark.sparkContext.setLogLevel("ERROR")

// Sample dataframe
val df = List(("1", "2", "01/01/2021"),
              ("A", "B", "02/01/2021"),
              ("M", "N", "05/01/2021"))
            .toDF("Column1(String)", "Column2(String)", "Date")

df
  .withColumn("Date",explode_outer(sequence(to_date('Date,"dd/MM/yyyy"),
                                    to_date(lit("07/01/2021"),"dd/MM/yyyy"))))
  .withColumn("Date",date_format('Date,"dd/MM/yyyy"))
  .show(false)

/*
+---------------+---------------+----------+
|Column1(String)|Column2(String)|Date      |
+---------------+---------------+----------+
|1              |2              |01/01/2021|
|1              |2              |02/01/2021|
|1              |2              |03/01/2021|
|1              |2              |04/01/2021|
|1              |2              |05/01/2021|
|1              |2              |06/01/2021|
|1              |2              |07/01/2021|
|A              |B              |02/01/2021|
|A              |B              |03/01/2021|
|A              |B              |04/01/2021|
|A              |B              |05/01/2021|
|A              |B              |06/01/2021|
|A              |B              |07/01/2021|
|M              |N              |05/01/2021|
|M              |N              |06/01/2021|
|M              |N              |07/01/2021|
+---------------+---------------+----------+ */

Upvotes: 1

Related Questions