mani_nz
mani_nz

Reputation: 5602

Looping through dataframe columns to form a nested dataframe - Spark

I have a dataframe as below,

val x = Seq(("A", "B", "C", "D")).toDF("DOC", "A1", "A2", "A3")

+---+---+---+---+
|DOC| A1| A2| A3|
+---+---+---+---+
|  A|  B|  C|  D|
+---+---+---+---+

Here the A's can be till 100, so I want to loop and get all the A's and nest them under a common structure as below,

   +---+---+---+----+
    |DOC|A LIST     |
    +---+---+---+---+
    |  A| [B, C, D] |
    +---+---+---+---+

I want to create a dataframe by creating dynamic column names like A1, A2.. by looping from 1 to 100 and do a select.

How can I do this?

Cheers!

Upvotes: 0

Views: 247

Answers (1)

Leo C
Leo C

Reputation: 22449

Simply assemble a list of columns to be combined into an array, transform the column names into Columns via col and apply method array to resulting list:

val df = Seq(
  (1, "a", "b", "c", 10.0),
  (2, "d", "e", "f", 20.0)
).toDF("id", "a1", "a2", "a3", "b")

val selectedCols = df.columns.filter(_.startsWith("a")).map(col)
val otherCols = df.columns.map(col) diff selectedCols

df.select((otherCols :+ array(selectedCols: _*).as("a_list")): _*).show
// +---+----+---------+
// | id|   b|   a_list|
// +---+----+---------+
// |  1|10.0|[a, b, c]|
// |  2|20.0|[d, e, f]|
// +---+----+---------+

Upvotes: 2

Related Questions