DataNoob
DataNoob

Reputation: 205

Spark DataFrame, how to to aggregate sequence of columns?

I have a dataframe and I could do aggregate with static column names i.e:

df.groupBy("_c0", "_c1", "_c2", "_c3", "_c4").agg(
concat_ws(",", collect_list("_c5")),
concat_ws(",", collect_list("_c6")))

And it works fine but how to do same if I get sequence of groupby columns and sequence of aggregate columns?

In other words, what if I have

val toGroupBy = Seq("_c0", "_c1", "_c2", "_c3", "_c4")
val toAggregate = Seq("_c5", "_c6")

and want to perform the above?

Upvotes: 1

Views: 1919

Answers (1)

Shaido
Shaido

Reputation: 28322

To perform the same groupBy and aggregation using the sequences you can do the following:

val aggCols = toAggregate.map(c => expr(s"""concat_ws(",", collect_list($c))"""))
df.groupBy(toGroupBy.head, toGroupBy.tail:_*).agg(aggCols.head, aggCols.tail:_*)

The expr function takes an expression and evaluates it into a column. Then the varargs variants of groupBy and agg are applied on the lists of columns.

Upvotes: 1

Related Questions