Reputation: 383
I have the below dataframe with me .
+-------+---+----+
|Company|EMP|Flag|
+-------+---+----+
| M| c1| Y|
| M| c1| Y|
| M| c2| N|
| M| c2| N|
| M| c3| Y|
| M| c3| Y|
| M| c4| N|
| M| c4| N|
| M| c5| Y|
| M| c5| Y|
| M| c6| Y|
+-------+---+----+
Created by -
val df1=Seq(
("M","c1","Y"),
("M","c1","Y"),
("M","c2","N"),
("M","c2","N"),
("M","c3","Y"),
("M","c3","Y"),
("M","c4","N"),
("M","c4","N"),
("M","c5","Y"),
("M","c5","Y"),
("M","c6","Y")
)toDF("Company","EMP","Flag")
How can I take the distinct count EMP when FLAG =Y and FLAG = N. Once, an EMP has got a flag, it wont change again.I can achieve this with distinct. But is there any way to achive this without distinct(this is to avoid an extra join in the code)
Expected Output :
+---+---+---+---------+----------+
| M| Y| N|Total_ROWs|Unique_Emp|
+---+---+---+---------+----------+
| M| 4| 2| 11| 6|
+---+---+---+---------+----------+
Upvotes: 0
Views: 483
Reputation: 13551
What about this?
df1.groupBy("Company", "EMP", "Flag")
.agg(count("Company").as("Row"))
.groupBy("Company", "EMP", "Flag")
.agg(count("Flag").as("YN"), sum("Row").as("Row"))
.groupBy("Company")
.agg(count(when($"Flag" === "Y", 1)).as("Y"), count(when($"Flag" === "N", 1)).as("N"), sum("Row").as("Total_ROWs"), count("EMP").as("Unique_EMP"))
.show
+-------+---+---+----------+----------+
|Company| Y| N|Total_ROWs|Unique_EMP|
+-------+---+---+----------+----------+
| M| 4| 2| 11| 6|
+-------+---+---+----------+----------+
Upvotes: 1