Reputation: 101
With this data table "table":
and a table "ID" of well known newID value (1,2,3)
I would like to find all the months in which some newID value are missing. The result should only include months that that have at least 1 newId
.
This is the expected result for the data listed above:
How can I achieve this task ?
Table Schema: (fiddle) http://sqlfiddle.com/#!9/72af42
CREATE TABLE `ID`(
Id int unsigned not null primary key
);
INSERT INTO `ID` (Id) VALUES (1),(2),(3);
CREATE TABLE `table`(
recId int unsigned not null auto_increment primary key,
Data DateTime not null,
newID int not null
);
INSERT INTO `table` (`Data`,`newID`) VALUES
('2017-12-06',1),
('2017-12-06',3),
('2017-11-16',1),
('2017-11-16',2),
('2017-11-16',3),
('2017-10-05',2),
('2017-10-05',3),
('2017-10-03',2),
('2017-10-03',3),
('2017-08-16',1),
('2017-08-16',2),
('2017-08-16',3),
('2017-05-05',1),
('2017-05-05',2),
('2017-05-05',3);
Upvotes: 1
Views: 104
Reputation: 16554
Because you only need the missing ID
records that have a correlated Month
, this is a simple Non-Correlated sub-query scenario.
In SQL a Non-Correlated sub-query can be modeled either using a
NOT EXISTS
expression in theWHERE
clause, or you can use aLEFT OUTER JOIN
and only include the NULL results for the columns in the joined table, as this would indicate the records where no match was found
This query in only complicated by the fact that you want to evaluate the MONTH
component of the date, not the explicit date value. SQL gives use all the necessary tools for this, we can even format the output that you desire:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
LEFT OUTER JOIN `table` t ON YEAR(t.Data) = m.year AND MONTH(t.Data) = m.month AND t.newId = i.Id
WHERE t.Data IS NULL
ORDER BY m.Year DESC, m.month DESC
Have a look at this fiddle: http://sqlfiddle.com/#!9/72af42/2
Choosing between
WHERE NOT EXISTS
ANDLEFT OUTER JOIN
can affect performance slightly, but the affect will depend on your query, your RDBMS and the available indexes. I personally use theJOIN
syntax first because IMO it is simpler to maintain, but you use your own discretion.There is a lot of talk at least in MS SQL that
NOT EXISTS
should be faster thanJOIN
but if performance is an issue you for this specific query you should look at storing theyear
andmonth
columns as persisted values so that they can be indexed and to reduce the function evaluatations.because it will evaluate less lookups.
For a comparison, this is the equivalent WHERE NOT EXISTS
query: http://sqlfiddle.com/#!9/72af42/5
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
WHERE NOT EXISTS (
SELECT *
FROM `table` t
WHERE YEAR(t.Data) = m.year
AND MONTH(t.Data) = m.month
AND t.newId = i.Id
)
ORDER BY m.Year DESC, m.month DESC
If we pre-evaluate the YEAR()
and MONTH()
and store the results in the table directly, then the query speed will improve but we can also add indexes to super-charge it.
Consider the over all PROs and CONs before going this far...
- Do you really need this level of optimisation?
- How often is the query going to be executed?
- Can you change the application logic to use a more appropriate
WHERE
clause to restrict the scope of the data instead?
One solution to this is to create and manage a materialised view. This is a DW technique that effectively allows you to define a view but have it executed on a regular basis and stored into it's own table space.
A materialised View does not optimise your query, but it allows you to execute complex and long-running query once, so that the results can be queried directly like a normal table, without having to re-evaluate column expressions.
Your data and the type of query looks like a good candidate for materialized view because it is querying historical data that has a zero or very low rate of change, only new rows are updated, and we probably do not care for the current month results. In this case if you end up running the query many times, and the result is staying more or less constant, then why not run the query as a process, say every month and store the results in a purpose built table, then your application can query that table frequently with lightning fast results.
MySQL does not support Materialized Views, but you can replicate the concept as explained above in your application logic, some other RDBMS provide this OOTB, its the concept that should be considered.
You could just add the extra columns to your table and maintain these columns from the user/application logic, but that is not very reliable, unless you trust your application developers and the app is the only process that will be updating this table.
Computed columns are perfect for reliability in this scenario, but they will only help us with performance if you can persist the value to the column store. (the default state of a computed column is that the expression will be evaluated at execution time, which offers little benefit over the current query)
Again this is where MySQL will let you down, many other RDBMS offer simpler ways to do this, you need MySQL v5.7 for this to work
ALTER TABLE `table` ADD `year` GENERATED ALWAYS AS (YEAR(Data)) STORED; ALTER TABLE `table` ADD `month` GENERATED ALWAYS AS (MONTH(Data)) STORED;
Your other option is to add the columns, and then use triggers to maintain the values, MySQL doesn't make this easy, but it can work
Add the columns to your table:
ALTER TABLE `table` ADD (`year` int NULL);
ALTER TABLE `table` ADD (`month` int NULL);
Create Triggers to manage the values in these columns so the user cannot override them:
DELIMITER $$
CREATE TRIGGER persist_index_values_insert
BEFORE INSERT ON `table` FOR EACH ROW
BEGIN
SET new.year= YEAR(new.Data);
SET new.month = MONTH(NEW.Data);
END$$
CREATE TRIGGER persist_index_values_update
BEFORE UPDATE ON `table` FOR EACH ROW
BEGIN
SET NEW.year = YEAR(NEW.Data);
SET NEW.month = MONTH(NEW.Data);
END$$
DELIMITER ;
Simpler Query:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT `year`, `month`
FROM `table`
GROUP BY `year`, `month`
) m
CROSS JOIN `ID` i
LEFT OUTER JOIN `table` t ON t.year = m.year AND t.month = m.month AND t.newId = i.Id
WHERE t.Data IS NULL
ORDER BY m.Year DESC, m.month DESC
Indexes can now be applied as needed, you should consult your query execution plans for guidance, but I would suggest you need one index for year
and month
and newId
as a minimum:
CREATE INDEX IX_TABLE_YEAR_MONTH_NEWID ON `table` (`year`,`month`,'newId');
Upvotes: 0