Reputation: 891
Data
first_name,id,age
abc,1,53
bcd,2,68
abc,3,68
Made this data into personDF
dataframe
personDF.groupBy("id").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()
I want to get the min age and max age based on the when condition. It's not working.
Please let me know how can I do this?
Upvotes: 0
Views: 103
Reputation: 370
You need to group by 'first_name' column for this to work:
df.groupBy("first_name").agg(when(lower($"first_name")==="abc",min($"age")).otherwise(max($"age")).alias("min_age")).show()
+----------+-------+
|first_name|min_age|
+----------+-------+
| abc| 53|
| bcd| 68|
+----------+-------+
Upvotes: 1
Reputation: 31
You can't use an aggregate function on the same column on which you are grouping the dataframe. This will not work as one group will have only one value of the column on which you have grouped.
Please check out this link for better understanding.
Upvotes: 0