Condition while aggregation in Spark

Question

This question is related to conditional aggregation on SQLs. Normally we put conditions using 'case' statement in select clause but that case condition checks only the row under consideration. Consider the below data:

BEGIN TRANSACTION;

/* Create a table called NAMES */
CREATE TABLE NAMES(M CHAR, D CHAR, A INTEGER);

/* Create few records in this table */
INSERT INTO NAMES VALUES('M1','Y',2);
INSERT INTO NAMES VALUES('M1','Y',3);
INSERT INTO NAMES VALUES('M2','Y',2);
INSERT INTO NAMES VALUES('M2',null,3);
INSERT INTO NAMES VALUES('M3',null,2);
INSERT INTO NAMES VALUES('M3',null,3);
COMMIT;

This query groups using column 'M' and checks if column 'D' is null or not (separately for each record) and put a sum aggregation on column 'A'.

select sum(case when D = 'Y' then 0 else A end) from NAMES group by M;

Output for this query is:

M1|0
M2|3
M3|5

But if we want to check column 'D' for each record in the group if it is null. If any of the records is 'Y' in the group, do not perform 'sum' aggregation at all. In brief, the expected output for the above scenario is:

M1|0
M2|0
M3|5

Answers in Spark SQL are highly appreciated.

Condition while aggregation in Spark

Answers (1)

Related Questions