Reputation: 730
I want to get max value for each column of a dataframe in Spark. My code works just for one column (e.g. first):
val col = df.columns(0);
val Row(maxValue: Int) = df.agg(max(col)).head();
I don't know how to combine foreach
and the code I have so that I can get max value for every column in the dataframe. (I do not know how many columns are in the dataframe and what are the column names)
Thanks.
Upvotes: 3
Views: 3588
Reputation: 37852
foreach
is rarely the useful solution when you want to transform a collection (in this case - array of column names) to something else (in this case - their maximum values). Instead, use map
- and then pass the result to agg
:
import spark.implicits._
import functions._
val df = Seq((1,3), (3, 1), (2, 2)).toDF("a", "b")
// map columns into columns representing their maximums
val maxCols: Array[Column] = df.columns.map(max)
// aggregate all at once (have to separate first from rest due to agg's signature):
val row: Row = df.agg(maxCols.head, maxCols.tail: _*).head
EDIT: as @user8371915 reminds us, there's a much shorter version:
val row: Row = df.groupBy().max().head
Upvotes: 5