Reputation: 1716
I have a table with three columns:
Each student's enrollment period is provided by start_yearmonthdate and end_yearmonthdate. Where the values are in the yymmdd format. Also, if a student has not specified an end date yet, the end_yearmonthdate will show '000000'. You can consider the timeframe is in the year of 2022.
I want to know for each month, how many students are enrolled. For example, for student 100001, Jan, Feb, and Mar are all the enrollment months for this student.
Here is the table:
student_id | start_yearmonthdate | end_yearmonthdate |
---|---|---|
100001 | 220101 | 220331 |
100002 | 220201 | 220731 |
100003 | 220101 | 221231 |
100004 | 220701 | 221031 |
100005 | 220601 | 220930 |
100006 | 220101 | 221231 |
100006 | 220201 | 221231 |
100007 | 220701 | 000000 |
100008 | 220601 | 000000 |
You can use sqlfiddle.com to build the table:
create table training (
student_id int(6),
start_yearmonthdate varchar(30),
end_yearmonthdate varchar(30)
);
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100001, '220101', '220331');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100002, '220201', '220731');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100003, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100004, '220701', '221031');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100005, '220601', '220930');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220101', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100006, '220201', '221231');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100007, '220701', '000000');
insert into training(student_id, start_yearmonthdate, end_yearmonthdate)
values(100008, '220601', '000000');
Update: I am trying to get this work in postgresql as well. But I am new to postgresql and the syntax is taking too much time to change/validate. Can someone provide answer for postgresql as well, thanks.
Upvotes: 0
Views: 412
Reputation: 49395
MySQL has no time series, so you must generate one to get to the data, which is needed to group the data.
Still every STR_TO_DATE
will cost lots of performace.
WITH RECURSIVE t_series as ( select '2022-01-01' as dt UNION SELECT DATE_ADD(t_series.dt, INTERVAL 1 DAY) FROM t_series WHERE DATE_ADD(t_series.dt, INTERVAL 1 DAY) <= '2022-12-31' ) select COUNT(DISTINCT t2.student_id) ,EXTRACT(YEAR_MONTH FROM t1.dt) FROM t_series t1 JOIN training t2 ON t1.dt BETWEEN STR_TO_DATE(t2.start_yearmonthdate,'%y%m%d') AND STR_TO_DATE(IF(t2.end_yearmonthdate = '000000','221231',t2.end_yearmonthdate),'%y%m%d') GROUP BY EXTRACT(YEAR_MONTH FROM t1.dt);
COUNT(DISTINCT t2.student_id) | EXTRACT(YEAR_MONTH FROM t1.dt) ----------------------------: | -----------------------------: 3 | 202201 4 | 202202 4 | 202203 3 | 202204 3 | 202205 5 | 202206 7 | 202207 6 | 202208 6 | 202209 5 | 202210 4 | 202211 4 | 202212
db<>fiddle here
For posgres it will look similar
WITH t_series as ( SELECT t.day::date FROM generate_series(timestamp '2022-01-01' , timestamp '2022-12-31' , interval '1 day') AS t(day)) select COUNT(DISTINCT t2.student_id) ,to_char( t1.day, 'YYYY-MM') FROM t_series t1 JOIN training t2 ON t1.day BETWEEN TO_DATE(t2.start_yearmonthdate,'YYMMDD') AND TO_DATE(CASE WHEN t2.end_yearmonthdate = '000000' THEN '221231' ELSe t2.end_yearmonthdate END,'YYMMDD') GROUP BY to_char( t1.day, 'YYYY-MM');
count | to_char ----: | :------ 3 | 2022-01 4 | 2022-02 4 | 2022-03 3 | 2022-04 3 | 2022-05 5 | 2022-06 7 | 2022-07 6 | 2022-08 6 | 2022-09 5 | 2022-10 4 | 2022-11 4 | 2022-12
db<>fiddle here
Upvotes: 2
Reputation: 95072
One of the main problems is to get a proper table with real dates, for that is easier to work with and less prone to errors.
Once this is done, get all months from the first date to the last date in the table with a recursive query.
Then select the months and count the students who came and went.
with recursive
proper_table as
(
select
student_id,
str_to_date(start_yearmonthdate, '%y%m%d') as start_date,
case when end_yearmonthdate = '000000'
then null
else str_to_date(end_yearmonthdate, '%y%m%d')
end as end_date
from training
),
months (last_date) as
(
select min(last_day(start_date))
from proper_table
union all
select last_date + interval 1 month
from months
where last_date + interval 1 month < (select max(start_date) from proper_table)
or last_date + interval 1 month < (select max(end_date) from proper_table)
)
select
date_format(last_date, '%Y-%m'),
(select count(*) from proper_table where start_date <= months.last_date) -
(select count(*) from proper_table where end_date <= months.last_date)
as number_of_students
from months;
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=b47cecc64b18f48385e94505f5b481bc
Upvotes: 1
Reputation: 51
There are lots of steps involved here, including creating a "months calendar" table using recursive CTE.
I tried my best to make each step clear by using expressive aliases and CTE names; However, you might need to go over each step thoroughly.
WITH Step_0_create_months_calendar AS
(
SELECT CAST('2022-01-01' AS date) AS month_1st_day
UNION ALL
SELECT DATEADD(MONTH, 1, month_1st_day)
FROM Step_0_create_months_calendar
WHERE month_1st_day < '2024-01-01'
)
, Step_1_real_dates AS
(
SELECT student_id
, CAST('20' + start_yearmonthdate AS date) AS start_yearmonthdate
, CAST(CASE
WHEN end_yearmonthdate = '000000' THEN GETDATE()
ELSE '20' + end_yearmonthdate
END AS date) AS end_yearmonthdate
FROM training
)
, step_2_explode_months_per_student AS
(
SELECT *
FROM Step_1_real_dates AS S1
INNER JOIN Step_0_create_months_calendar AS clndr
ON clndr.month_1st_day BETWEEN S1.start_yearmonthdate AND S1.end_yearmonthdate
)
, Step_3_just_student_year_and_month AS
(
SELECT student_id, YEAR(month_1st_day) as year_num ,MONTH(month_1st_day) AS month_num
FROM step_2_explode_months_per_student
)
-- Last step: Aggregating
SELECT year_num, month_num, COUNT(DISTINCT student_id) AS num_of_students
FROM Step_3_just_student_year_and_month
GROUP BY year_num, month_num
ORDER BY year_num, month_num
Upvotes: 2