CapitanCastor
CapitanCastor

Reputation: 31

SQL COUNT By month

I have the following table that registers the days students went to school. I need to count the days they are PRESENT, but also need to count the total of school days for each month. (When ASISTENCIA is either 0 or 1)

This what I have so far, but it doesn't count the total.

BEGIN
    SELECT 
        u.user_id user_id,
        u.user_first_name as names,
        u.user_last_name_01 as lastname1,
        u.user_last_name_02 as lastname2,
        MONTH(a.FECHA_ASISTENCIA) month,
        COUNT(*) as absent_days,
        p.PHONE as phone,
        p.CITY as city,
        @EDUCATION_LEVEL_ID
    FROM 
        users u
        inner join asistencia a ON u.user_id = a.USER_ID
        inner join profile p ON u.rut_SF = p.RUT_SF         
    WHERE 
        a.ASISTENCIA = 0 -- NOT PRESENT
        AND a.EDUCATION_LEVEL_ID = @EDUCATION_LEVEL_ID  
        AND YEAR(a.FECHA_ASISTENCIA) = @EDUCATION_LEVEL_YEAR
    GROUP BY 
        u.user_id,
        u.user_first_name,
        u.user_last_name_01,
        u.user_last_name_02,
        MONTH(a.FECHA_ASISTENCIA),
        p.TELEFONO,
        p.CIUDAD_DOM    
    ORDER BY mes
END 

ATTENDANCE

USER_ID DATE ATTENDANCE EDUCATION_LEVEL_ID
123 2021-04-13 0 1
123 2021-04-14 1 1

DESIRED OUTPUT

names lastname1 lastname2 month absent_days total_class_days city
JOHN SMITH SMITH 3 10 24 CITY
JOHN SMITH SMITH 4 8 24 CITY

Upvotes: -1

Views: 168

Answers (1)

Josh
Josh

Reputation: 1601

Without examples of what these tables look like, it is hard to give you a solid answer. However, it appears as though your biggest challenge is that ABSTENTIA does not exist.

This is a common problem for analysis - you need to create rows that do not exist (when the user was absent).

The general approach is to:

  1. create a list of unique users
  2. create a list of unique dates you care about
  3. Cross Join (cartesian join) these to create every possible combination of user and date
  4. Outer Join #3 to #4 so you can populate a PRESENT flag, and now you can see both who were PRESENT and which were not.
  5. Filter out rows which don't apply (for instance if a user joined on 3/4/2021 then ignore the blank rows before this date)

You can accomplish this with some SQL that looks like this:

WITH GLOBAL_SPINE AS (
  SELECT
    ROW_NUMBER() OVER (ORDER BY NULL) as INTERVAL_ID,
    DATEADD('DAY', (INTERVAL_ID - 1), '2021-01-01'::timestamp_ntz) as SPINE_START,
    DATEADD('DAY', INTERVAL_ID, '2022-01-01'::timestamp_ntz) as SPINE_END
  FROM TABLE (GENERATOR(ROWCOUNT => 365))
), 
GROUPS AS (
  SELECT 
    USERID,
    MIN(DESIRED_INTERVAL) AS LOCAL_START, 
    MAX(DESIRED_INTERVAL) AS LOCAL_END
  FROM RASGO.PUBLIC.RASGO_SDK__OP4__AGGREGATE_TRANSFORM__8AB1FEDF90
  GROUP BY 
    USERID 
), 
GROUP_SPINE AS (
  SELECT 
    USERID,
    SPINE_START AS GROUP_START, 
    SPINE_END AS GROUP_END
  FROM GROUPS G
  CROSS JOIN LATERAL (
    SELECT
      SPINE_START, SPINE_END
    FROM GLOBAL_SPINE S
    WHERE S.SPINE_START BETWEEN G.LOCAL_START AND G.LOCAL_END
  )
)
SELECT 
  G.USERID AS GROUP_BY_USERID,
  GROUP_START, 
  GROUP_END,
  T.*
FROM GROUP_SPINE G
LEFT JOIN {{ your_table }} T
  ON DESIRED_INTERVAL >= G.GROUP_START 
  AND DESIRED_INTERVAL < G.GROUP_END 
   AND G.USERID = T.USERID;

The above script works on Snowflake, but the syntax might be slightly different depending on your RDBMS. There are also some other tweaks you can make regarding when you insert the blank rows, I chose 'local' which means that we begin inserting rows for each user on their very first day. You could change this to global if you wanted to populate data from every single day between 1/1/2021 and 1/1/2022.

I used Rasgo to generate this SQL

Upvotes: 0

Related Questions