Reputation: 4787
I am using PostgreSQL and my data looks something like this:
UserID TimeStamp
1 2014-02-03
2 2014-02-03
3 2014-02-03
1 2014-03-03
2 2014-03-03
6 2014-03-03
7 2014-03-03
This is just dummy data for 2 days in which some UserID
is getting repeated on both the days. I would like to find out the number of repeated UserId
every month. For this example the final result set should look like:
Count Year Month
0 2014 2
2 2014 3
In the above table, March 2014 has 2 repeated UserID
and Feb 2014 has none.
I can find out the distinct UserID
for each month but not the repeated UserID
. Any help in this regard would be much appreciated.
Upvotes: 0
Views: 1100
Reputation: 125214
select
count(distinct userid) as "Count",
extract(year from t0.timestamp) as "Year",
extract(month from t0.timestamp) as "Month"
from
t t1
inner join
t t0 using (userid)
where t0.timestamp < date_trunc('month', t1.timestamp)
group by 2, 3
or may be faster
select
count(distinct userid) as "Count",
extract(year from t0.timestamp) as "Year",
extract(month from t0.timestamp) as "Month"
from t t1
where exists (
select 1
from t
where
userid = t1.userid
and
timestamp < date_trunc('month', t1.timestamp)
)
group by 2, 3
Upvotes: 1
Reputation: 1269693
Is this what you want?
select yyyymm, sum(case when cnt > 1 then 1 else 0 end) as dupcnt
from (select to_char(timestamp, 'YYYY-MM') as yyyymm, userid, count(*) as cnt
from table t
group by to_char(timestamp, 'YYYY-MM'), userid
) t
group by yyyymm
order by yyyymm;
Upvotes: 0
Reputation: 60462
To rephrase your question:
How many users are not new (i.e. already visited the shop/website/whatever in a previous month) for each month?
SELECT
yr, mon,
COUNT(*) AS all_users,
COUNT(*) - SUM(repeated) AS new_users,
SUM(repeated) AS existing_users
FROM
(
SELECT UserId,
EXTRACT(YEAR FROM TimeStamp) AS yr,
EXTRACT(MONTH FROM TimeStamp) AS mon,
CASE WHEN ROW_NUMBER() -- 1st time users get 0
OVER (PARTITION BY UserId
ORDER BY EXTRACT(YEAR FROM TimeStamp) ,
EXTRACT(MONTH FROM TimeStamp)) = 1
THEN 0
ELSE 1
END AS repeated
FROM vt
GROUP BY UserId,
EXTRACT(YEAR FROM TimeStamp),
EXTRACT(MONTH FROM TimeStamp)
) AS dt
GROUP BY yr,mon
ORDER BY 1,2
The inner GROUP BY is needed if there are multiple rows for a user within the same month.
Upvotes: 0
Reputation: 272
This might work, have not tested it out yet.
SELECT
COUNT(DISTINCT(UserId))
, EXTRACT(YEAR FROM TIMESTAMP TimeStamp) AS Year
, EXTRACT(MONTH FROM TIMESTAMP Timestamp) AS Month
FROM TABLE
GROUP BY TimeStamp
Upvotes: 0