Spark
Spark

Reputation: 2487

Difference between GroupByKey($"col") and GroupBy($"col") in spark scala

What would be the fundamental difference with using .GroupByKey and .GroupBy when I am using a column name of a DF as a parameter?

Which one is time efficient and how exactly does each mean can someone please explain in detail as I went through some examples but it was confusing.

Upvotes: 0

Views: 4160

Answers (1)

user10546212
user10546212

Reputation:

There is no groupByKey method that takes Column as an argument. There are methods which take functions, either:

def groupByKey[K](func: MapFunction[T, K], encoder: Encoder[K]): KeyValueGroupedDataset[K, T] 

or

def groupByKey[K](func: (T) ⇒ K)(implicit arg0: Encoder[K]): KeyValueGroupedDataset[K, T] 

Compared to groupBy that takes Columns:

def groupBy(cols: Column*): RelationalGroupedDataset 

or String

def groupBy(col1: String, cols: String*): RelationalGroupedDataset 

the difference should be obvious - the first two return KeyValueGroupedDataset (intended for processing with "functional", "strongly typed API, like mapGroups or reduceGroups), while the later methods returnRelationalGroupedDataset` (intended for processing with SQL-like API).

In general see:

Upvotes: 3

Related Questions