Reputation: 205
I have a dataframe and I could do aggregate with static column names i.e:
df.groupBy("_c0", "_c1", "_c2", "_c3", "_c4").agg(
concat_ws(",", collect_list("_c5")),
concat_ws(",", collect_list("_c6")))
And it works fine but how to do same if I get sequence of groupby columns and sequence of aggregate columns?
In other words, what if I have
val toGroupBy = Seq("_c0", "_c1", "_c2", "_c3", "_c4")
val toAggregate = Seq("_c5", "_c6")
and want to perform the above?
Upvotes: 1
Views: 1919
Reputation: 28322
To perform the same groupBy
and aggregation using the sequences you can do the following:
val aggCols = toAggregate.map(c => expr(s"""concat_ws(",", collect_list($c))"""))
df.groupBy(toGroupBy.head, toGroupBy.tail:_*).agg(aggCols.head, aggCols.tail:_*)
The expr
function takes an expression and evaluates it into a column. Then the varargs variants of groupBy
and agg
are applied on the lists of columns.
Upvotes: 1