Reputation: 21
I am trying to do #8 on this problem set from sqlzoo (https://sqlzoo.net/wiki/Window_LAG#LAG_using_a_JOIN).
The question is "For each country that has had at last 1000 new cases in a single day, show the date of the peak number of new cases."
The table covid gives the number of covid cases, deaths, and recoveries per day by country as so:
+-------------+-------------------------------+-----------+--------+-----------+
| Name | whn | confirmed | deaths | recovered |
+-------------+-------------------------------+-----------+--------+-----------+
| Afghanistan | Sun, 01 Mar 2020 00:00:00 GMT | 1 | 0 | 0 |
| Albania | Sun, 01 Mar 2020 00:00:00 GMT | 0 | 0 | 0 |
| Algeria | Sun, 01 Mar 2020 00:00:00 GMT | 1 | 0 | 0 |
+-------------+-------------------------------+-----------+--------+-----------+
Currently I have this code:
SELECT c.name, DATE_FORMAT(c.whn,'%Y-%m-%d') as this, d.peak
from ( select tw.name, max(tw.confirmed-lw.confirmed) as peak
FROM covid tw LEFT JOIN covid lw ON
DATE_ADD(lw.whn, INTERVAL 1 DAY) = tw.whn
AND tw.name=lw.name
where tw.confirmed-lw.confirmed > 1000
group by tw.name) d
join covid as c
on d.name = c.name
group by name
which gives me each country, the date, and the peak number of cases. However, the date is showing the first day for each country when the cases are above a 1000. How would I get the date where there is the peak number of cases?
| Name | this | peak |
|---------|------------|------|
| Austria | 2020-03-26 | 1321 |
| Belarus | 2020-04-20 | 1485 |
| Belgium | 2020-03-26 | 2454 |
Upvotes: 2
Views: 759
Reputation: 222592
You can compute the number of new cases by comparing confirmed
across subsequent days; for this, lag()
comes handy:
select
t.*,
confirmed - lag(confirmed, 1, 0) over(partition by name order by whn) new_cases
from mytable t
This assumes that each country has one record per day. You can then rank the records of each country by that, and filter on the top ranked day per country:
select *
from (
select
t*,
rank() over(partition by name order by new_cases desc) rn
from (
select
t.*,
confirmed - lag(confirmed, 1, 0) over(partition by name order by whn) new_cases
from mytable t
) t
where new_cases > 1000
) t
where rn = 1
Upvotes: 2
Reputation: 11
It doesn't look good, but it works for MySQL. For some reasons, other answers didn't pass.
select t2.name,DATE_FORMAT(t3.whn,'%Y-%m-%d'),t2.peakNewCases
from
(select a.name, max(a.confirmed-b.confirmed) as peakNewCases
from covid a
left join covid b
on DATE_ADD(b.whn, interval 1 day)=a.whn
and
a.name=b.name
where a.confirmed-b.confirmed>999
group by name
) t2
join
(select name, whn, confirmed-lag(confirmed,1) over (partition by name order by confirmed) as peakNewCases
from covid
) t3
on t2.peakNewCases=t3.peakNewCases
and
t2.name=t3.name
order by t3.whn
Upvotes: 1
Reputation: 1
SELECT
total_cases - LAG(total_cases,1) OVER (PARTITION BY state ORDER BY date)
AS new_cases,
total_cases,
state,
date
FROM statistics
Upvotes: 0
Reputation: 21
Use RANK() OVER (PARTITION BY name ORDER BY peak) AS rank
to rank the number of new cases everday of each country and then select the ones that rank = 1
in the outer query to find the peak. Here is my complete query:
SELECT name, date, peak FROM
(SELECT name, date, peak, RANK() OVER (PARTITION BY name ORDER BY peak DESC) as rank
FROM
(SELECT name, DATE_FORMAT(whn,'%Y-%m-%d') AS date,
confirmed - (LAG(confirmed, 1) OVER (PARTITION BY name ORDER BY whn)) AS peak
FROM
covid)
TAB WHERE peak >= 1000)
TAB WHERE rank = 1
ORDER BY date
It seems like the order of output needs to be adjusted to fit perfectly to the answer, but I think it works.
Upvotes: 0