Create new DataFrame with new rows depending in number of a column - Spark Scala

Question

I have a DataFrame with the following data:

  num_cta   | n_lines
110000000000|   2
110100000000|   3
110200000000|   1

With that information, I need to create a new DF with different number of rows depending the value that comes over the n_lines column.

For example, for the first row of my DF (110000000000), the value of the n_lines column is 2. The result would have to be something like the following:

  num_cta   
110000000000
110000000000

For all the Dataframe example that I show, the result to get would have to be something like this:

  num_cta  
110000000000
110000000000
110100000000
110100000000
110100000000
110200000000

Is there a way to do that? And multiply a row n times, depending on the value of a column value?

Regards.

Leo C · Accepted Answer

One approach would be to expand n_lines into an array with an UDF and explode it:

val df = Seq(
  ("110000000000", 2),
  ("110100000000", 3),
  ("110200000000", 1)
)toDF("num_cta", "n_lines")

def fillArr = udf(
  (n: Int) => Array.fill(n)(1)
)

val df2 = df.withColumn("arr", fillArr($"n_lines")).
  withColumn("a", explode($"arr")).
  select($"num_cta")

df2.show
+------------+
|     num_cta|
+------------+
|110000000000|
|110000000000|
|110100000000|
|110100000000|
|110100000000|
|110200000000|
+------------+

Create new DataFrame with new rows depending in number of a column - Spark Scala

Answers (2)

Related Questions