megv
megv

Reputation: 1471

Count Distinct Active users per month

I am trying to calculate active monthly users based on signup and cancel dates. Several users have NULL cancel dates (since still active). This query has a bunch of users as just null action_year and action_month.

SELECT
  T.action_year,
  T.action_month,
  COUNT(USerID) active_users
FROM
  (
  SELECT DISTINCT UserID, YEAR(SignupDate) action_year, MONTH(SignupDate) action_month FROM Stat 
  UNION
  SELECT DISTINCT UserID, YEAR(CancelDate) action_year, MONTH(CancelDate) action_date FROM  Stat 
  ) T
GROUP BY
  T.action_year,
  T.action_month
ORDER BY
  T.action_year ASC,
  T.action_month ASC

Upvotes: 0

Views: 851

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271241

Presumably active users are those where the month is somehow between the signup and cancel dates. This is tricky to define. Is it active on any date of the month? Active on the last day? Active on the first day?

I will assume the first. Active on any day during the month.

The idea is that the number of actives in a given month are all people who have signed up previously and not yet stopped. Given this observation, the calculation proceeds as follows:

  1. Get a list of all years and months. The following query assumes that signups occur every month to simplify this part.
  2. Use a correlated subquery to get the number of actives. This will do comparisons to the "yyyymm" form of the date.
  3. Be sure to remember that CancelDate can be NULL.

The resulting query is:

select ym.the_year, ym.the_month,
       (select count(*)
        from stat s
        where date_format(SignupDate, '%Y-%m') <= ym.yyyymm and
              (CancelDate is null or date_format(CancelDate, '%Y-%m') >= ym.yyyymm)
       ) as NumActives
from (select distinct year(SignupDate) as the_year, month(SignupDate) as the_month,
             date_format(SignupDate, '%Y-%m') as yyyymm
      from stat
     ) ym

Upvotes: 1

Related Questions