spark dataframe - GroupBy aggregation

Question

I have a dataframe to aggregate one column based on the rest of the other columns. I do not want to give all those rest of the columns in groupBy with comma separated as I have about 30 columns. Could somebody tell me how can I do it in a way that looks more readable.

right now, am doing - df.groupBy("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10",....).agg(c11)

I want to know if there is any better way..

Thanks, John

Chobeat · Accepted Answer

Specifying the columns is the clean way to do it but I believe you have quite a few options.

One of them is to go to Spark SQL and compose the query programmatically composing the string.

Another option could be to use the varargs : _* on a list of columns names, like this:

val cols = ...
df.groupBy( cols : _*).agg(...)

spark dataframe - GroupBy aggregation

Answers (2)

Related Questions