Reputation: 3168
From a given DataFrame, I'dl like to group only few rows together, and keep the other rows in the same dataframe.
My current solution is:
val aggregated = mydf.filter(col("check").equalTo("do_aggregate")).groupBy(...).agg()
val finalDF = aggregated.unionByName(mydf.filter(col("check").notEqual("do_aggregate")))
However I'd like to find a more eleguant and performant way.
Upvotes: 0
Views: 65
Reputation: 10693
Use a derived column to group by, depending on the check.
mydf.groupBy(when(col("check").equalTo("do_aggregate"), ...).otherwise(monotonically_increasing_id)).agg(...)
If you have a unique key in the dataframe, use that instead of monotonically_increasing_id
.
Upvotes: 1