Hoori M.
Hoori M.

Reputation: 730

How to get max value of each column?

I want to get max value for each column of a dataframe in Spark. My code works just for one column (e.g. first):

val col = df.columns(0);
val Row(maxValue: Int) = df.agg(max(col)).head();

I don't know how to combine foreach and the code I have so that I can get max value for every column in the dataframe. (I do not know how many columns are in the dataframe and what are the column names)

Thanks.

Upvotes: 3

Views: 3588

Answers (1)

Tzach Zohar
Tzach Zohar

Reputation: 37852

foreach is rarely the useful solution when you want to transform a collection (in this case - array of column names) to something else (in this case - their maximum values). Instead, use map - and then pass the result to agg:

import spark.implicits._
import functions._

val df = Seq((1,3), (3, 1), (2, 2)).toDF("a", "b")

// map columns into columns representing their maximums 
val maxCols: Array[Column] = df.columns.map(max)

// aggregate all at once (have to separate first from rest due to agg's signature):
val row: Row = df.agg(maxCols.head, maxCols.tail: _*).head

EDIT: as @user8371915 reminds us, there's a much shorter version:

val row: Row = df.groupBy().max().head

Upvotes: 5

Related Questions