Reputation: 730
How can I get the maximum value for different (string and numerical) types of columns in a DataFrame
in Scala using Spark?
Let say that is my data
+----+-----+-------+------+
|name|value1|value2|string|
+----+-----+-------+------+
| A| 7| 9| "a"|
| A| 1| 10| null|
| B| 4| 4| "b"|
| B| 3| 6| null|
+----+-----+-------+------+
and the desired outcome is:
+----+-----+-------+------+
|name|value1|value2|string|
+----+-----+-------+------+
| A| 7| 10| "a"|
| B| 4| 6| "b"|
+----+-----+-------+------+
Is there a function like in pandas with apply(max,axis=0)
or do I have to write a UDF?
What I can do is a df.groupBy("name").max("value1")
but I canot perform two max
in a row neither does a Sequence
work in max()
function.
Any ideas to solve the problem quickly?
Upvotes: 1
Views: 3738