Reputation: 1608
I have a log table with user activities. I'm trying to create a query that will show unique users entries and new users entries.
Sample data:
| uid | act | tm |
| --- | --- | ------------------------ |
| 1 | l | 2019-01-02T00:00:00.000Z |
| 1 | l | 2019-01-05T00:00:00.000Z |
| 2 | l | 2019-02-02T00:00:00.000Z |
| 1 | l | 2019-02-03T00:00:00.000Z |
| 2 | l | 2019-02-04T00:00:00.000Z |
| 3 | l | 2019-02-05T00:00:00.000Z |
| 1 | l | 2019-03-02T00:00:00.000Z |
| 2 | l | 2019-03-02T00:00:00.000Z |
| 3 | l | 2019-03-02T00:00:00.000Z |
| 4 | l | 2019-03-02T00:00:00.000Z |
The first part is easy : count(distinct(uid)) as tot_users
But is there a way to do second part - count users that appeared in that period but not before...
Here is what I got so far - https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/1
For the future reference, I've updated db-fiddle with 2 proposed solutions. Both work nicely:
https://www.db-fiddle.com/f/8EXsih1VAL1iWXKeauPQiB/6
SELECT
to_char( date_trunc('month', tm), 'YYYY-MM') as mnth,
count(uid) as tot_entries,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT
CASE
WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm)
THEN uid
END) AS new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY mnth
ORDER BY mnth;
SELECT
to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT
CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM'))
THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
Upvotes: 1
Views: 194
Reputation: 222582
You could use a window function in a subquery to compute the timestamp of the first log entry of each user, like:
SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l
Then, you can analyze the results in an outer query. When the date of the first log entry of a user belongs to the analyzis interval, you can count him as a new user.
Assuming the parameters :start_tm
and :end_tm
represent the start and end of the analyzis period, you would go:
SELECT
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN min_tm >= :start_tm AND min_tm < :end_tm THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
WHERE tm >= :start_tm AND tm < :end_tm
If you need to aggregate by months:
SELECT
DATE_TRUNC('month', tm) AS my_month,
COUNT(DISTINCT uid) as tot_users,
COUNT(DISTINCT CASE WHEN DATE_TRUNC('month', min_tm) = DATE_TRUNC('month', tm) THEN uid END) AS tot_new_users
FROM (SELECT l.*, MIN(tm) OVER(PARTITION BY uid) min_tm FROM logs l) x
GROUP BY my_month
ORDER BY my_month
Upvotes: 1
Reputation: 37472
You can use conditional aggregation. In a CASE
expression check for the existence of a log entry for the same user in a previous month. Unless you found such an entry return the user's id. Use that expression as the argument to count()
.
SELECT to_char(date_trunc('month', l1.tm), 'YYYY-MM') mnth,
count(l1.uid) tot_entries,
count(DISTINCT l1.uid) tot_users,
count(DISTINCT CASE
WHEN NOT EXISTS (SELECT *
FROM logs l2
WHERE l2.uid = l1.uid
AND to_char(date_trunc('month', l2.tm), 'YYYY-MM') < to_char(date_trunc('month', l1.tm), 'YYYY-MM')) THEN
l1.uid
END) new_users
FROM logs l1
GROUP BY mnth
ORDER BY mnth;
Upvotes: 1
Reputation: 880
you might use either the having clause or a self-join. You mentioned a period ,so I am not sure about the exact filter but let's assume it is a simple case you can do something like this
select
uid,
case when mintm<'2019-03-02T00:00:00.000Z' --cutoff
then 'old' else 'new'
end flag
from (
select uid, min(tm) mintm from table
group by uid ) as first_logins
Upvotes: 0