Babu
Babu

Reputation: 891

Doing aggregations on when method in spark

Data


first_name,id,age

abc,1,53

bcd,2,68

abc,3,68

Made this data into personDF dataframe

personDF.groupBy("id").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()

I want to get the min age and max age based on the when condition. It's not working.

Please let me know how can I do this?

Upvotes: 0

Views: 103

Answers (2)

Shashank
Shashank

Reputation: 370

You need to group by 'first_name' column for this to work:

df.groupBy("first_name").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()

+----------+-------+
|first_name|min_age|
+----------+-------+
|       abc|     53|
|       bcd|     68|
+----------+-------+

Upvotes: 1

Kunal Thakare
Kunal Thakare

Reputation: 31

You can't use an aggregate function on the same column on which you are grouping the dataframe. This will not work as one group will have only one value of the column on which you have grouped.

Please check out this link for better understanding.

Upvotes: 0

Related Questions