Rolintocour
Rolintocour

Reputation: 3168

Spark: group only part of the rows in a DataFrame

From a given DataFrame, I'dl like to group only few rows together, and keep the other rows in the same dataframe.

My current solution is:

val aggregated = mydf.filter(col("check").equalTo("do_aggregate")).groupBy(...).agg()
val finalDF = aggregated.unionByName(mydf.filter(col("check").notEqual("do_aggregate")))

However I'd like to find a more eleguant and performant way.

Upvotes: 0

Views: 65

Answers (1)

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10693

Use a derived column to group by, depending on the check.

mydf.groupBy(when(col("check").equalTo("do_aggregate"), ...).otherwise(monotonically_increasing_id)).agg(...)

If you have a unique key in the dataframe, use that instead of monotonically_increasing_id.

Upvotes: 1

Related Questions