Reputation: 7
CONTEXT: I am pretty new to coding, starting with python for about 2 months before switching gears to SQL for the past few months. I am pretty much self-taught, using youtube videos and taking classes through Udemy(just as preference in case this question sounds silly). I am working on a practice problem. The problem asked:
“Create a visualization that provides a breakdown between the male and female employees working in the company each year, starting from 1990”. (The visualization part is for tableau, but for now I am first creating the query for the output needed in SQL).
I know that in general, WHERE is used for a general condition for all the records in the query vs HAVING is used to further filtered and can be used with aggregate function, while WHERE cannot be used with aggregate functions. I used WHERE in my code vs the solution used HAVING, with some differences in each code. They both produced the same results.
QUESTION: Is my code acceptable? Was it a fluke that I got the correct results, is it correct but maybe not the most "formal" or "professional" way of doing it (class focuses heavily on making sure you have clean code and use professional standards to increase readability and such), or is it completely fine and just two different ways to solve the same issue? The question didn't explicitely say that they were multiple solutions to it.
MY CODE:
SELECT
YEAR(d.from_date) AS year,
e.gender,
COUNT(e.emp_no) AS employees
FROM
t_employees e
JOIN
t_dept_emp d ON e.emp_no = d.emp_no
WHERE
d.from_date BETWEEN '1989-12-31' AND sysdate()
GROUP BY year, e.gender
ORDER BY year;
THE SOLUTION:
SELECT
YEAR(d.from_date) AS calendar_year,
gender,
COUNT(e.emp_no) AS Employees
FROM
t_employees e
JOIN
t_dept_emp d ON e.emp_no = d.emp_no
GROUP BY calendar_year, e.gender
HAVING calendar_year >= 1990
ORDER BY calendar_year;
Thank you in advance!
Upvotes: 0
Views: 47
Reputation: 781751
The two queries are equivalent. If you use a WHERE
filter that filters the same expression as one of the columns in the GROUP BY
, the queries produce the same result. It's likely that your solution is more efficient, because aggregation requires memory to store all the groups. Filtering out those rows during the early WHERE
processing reduces that memory use (ideally the database engine would be able to detect this equivalence and optimize the query either way, but MySQL's optimizer may not be that smart).
Upvotes: 2