Reputation: 57
I have the following 3 calculated columns which define the training status for each employee:
training status in 2023 calculated as if [test attribute] = 'e1' and year([assigned date]) = 2023 then 'overdue' elseif [test attribute] = 'e2' and year([assigned date]) = 2023 then 'late completion' else '' end;
training status in 2022 calculated as if [test attribute] = 'e1' and year([assigned date]) = 2022 then 'overdue' ;
training status in 2021 calculated as if [test attribute] = 'e1' and year([assigned date]) = 2021 then 'overdue';
I want to get the exceptions count where if training status in 2023 = e1 or e2 then that population and if training status for 2021 or training status for 2022 = e1 then from those ids.
In the sample data below, the exceptions count should be 3 (Henry, Sam and Tom). We don't count Mike because his training status in 2023 is neither e1 nor e2.
CREATE TABLE users (
id INT,
name VARCHAR(255) NOT NULL,
test attribute VARCHAR(255),
assigned_date DATE,
type VARCHAR(255) NOT NULL
);
INSERT INTO users
(id, name, test attribute, assigned_date, type ) VALUES
(102, 'Henry', 'E1', '2023-05-01', 'comp'),
(102, 'Henry', 'E1', '2022-01-30', 'comp'),
(102, 'Henry', 'E2', '2023-05-01', 'comp'),
(101, 'Mike', 'X', '2023-10-21', 'other'),
(101, 'Mike', 'E1', '2022-06-19', 'other'),
(100, 'Sam', 'E2', '2023-11-12', 'comp'),
(100, 'Sam', 'E1', '2021-09-18', 'other'),
(100, 'Sam', 'E1', '2023-11-12', 'other'),
(103, 'Tom', 'X', '2023-11-12', 'other'),
(103, 'Tom', 'E1', '2021-09-18', 'comp'),
(103, 'Tom', 'E2', '2023-11-12', 'other');
SELECT * FROM users;
I tried the following calculations but it is not giving me accurate count. I have a calculated field called exceptions count in Tableau, where I am trying to get the count and this is what I have tried:
If ([test attribute] IN ('e1', 'e2') and year([assigned date]) = 2023) and if [test attribute] = 'e1' and year([assigned date]) IN (2022, 2021) then 1 else 0 end;
What is wrong?
Upvotes: 0
Views: 140
Reputation: 27333
You can use a sub-query (CTE in my example) to calculate the error conditions then filter by them. And then use count(distinct id)
to get the count.
with cte as (
select id, [Name]
, case when exists (select 1 from users c2 where c2.id = c1.id and year(assigned_date) = 2023 and training_category in ('E1','E2')) then 1 else 0 end Has2023Exception
, case when exists (select 1 from users c2 where c2.id = c1.id and year(assigned_date) in (2021, 2022) and training_category in ('E1')) then 1 else 0 end Has2022or2021Exception
from users c1
)
select count(distinct id)
from cte
where Has2023Exception = 1 and Has2022or2021Exception = 1;
Returns
Count |
---|
3 |
Or you could use the having
clause to filter the rows, then count them e.g.
with cte as (
select 1 Num
from users
group by id, [Name]
having max(case when year(assigned_date) = 2023 and training_category in ('E1','E2') then 1 else 0 end) = 1
and max(case when year(assigned_date) in (2021, 2022) and training_category in ('E1') then 1 else 0 end) = 1
)
select count(*)
from cte;
Which might not be much smaller or simpler, but performs better due to only a single access of the table.
Upvotes: 1
Reputation: 35603
SQL uses "case expressions" instead of if/else syntax (although you can use if/else in T-SQL) but the syntax of a case expression is something you should probably get used to if using SQL. The syntax is very similar to your stated requirement, e.g:
select
case
when [training category] = 'E1' and assigned_year = 2023 then 'overdue'
when [training category] = 'E2' and assigned_year = 2023 then 'late completion'
else ''
end as calc1
, case
when [training category] = 'E1' and assigned_year < 2023 then 'overdue'
end as calc2
, *
from myusers
outer apply (select year(try_cast([assigned date] as date)) as assigned_year) oa1
see this working here
calc1 | calc2 | id | name | training category | assigned date | type | assigned_year |
---|---|---|---|---|---|---|---|
overdue | 102 | Henry | E1 | 2023-05-01 | comp | 2023 | |
overdue | 102 | Henry | E1 | 2022-01-30 | comp | 2022 | |
late completion | 102 | Henry | E2 | 2023-05-01 | comp | 2023 | |
101 | Mike | X | 2023-10-21 | other | 2023 | ||
overdue | 101 | Mike | E1 | 2022-06-19 | other | 2022 | |
late completion | 100 | Sam | E2 | 2023-11-12 | comp | 2023 | |
overdue | 100 | Sam | E1 | 2021-09-18 | other | 2021 | |
overdue | 100 | Sam | E1 | 2023-11-12 | other | 2023 | |
103 | Tom | X | 2023-11-12 | other | 2023 | ||
overdue | 104 | Tom | E1 | 2021-09-18 | comp | 2021 | |
late completion | 103 | Tom | E2 | 2023-11-12 | other | 2023 |
In your sample data you show the "assigned date" as a varchar, so I have included a way to cast that to a date. If the date is already a date column then the try_cast([assigned ddate] as date)
isn't required.
Note: A case expression starts "top to bottom" so if the first "when" is true it ignores any following "when" or "else" conditions, if the first is false then it evaluates the next "when", and so on until finished (so it doesn't need "elseif" but can achieve the equivalent).
Small point: Whilst the correct term is "case expression" you will regrettably find many references to "case statement".
==================================================================================
I think you may be able to combine "calc1" and "calc2" by using "less than or equal" for the overdue conditions:
select
case
when [training category] = 'E1' and assigned_year <= 2023 then 'overdue'
when [training category] = 'E2' and assigned_year = 2023 then 'late completion'
else null
end as calc1
, *
from myusers
outer apply (select year(try_cast([assigned date] as date)) as assigned_year) oa1
Upvotes: 0