Patthebug
Patthebug

Reputation: 4787

Find number of repeating visitors in a month - PostgreSQL

I am using PostgreSQL and my data looks something like this:

UserID    TimeStamp
1          2014-02-03
2          2014-02-03
3          2014-02-03
1          2014-03-03
2          2014-03-03
6          2014-03-03
7          2014-03-03

This is just dummy data for 2 days in which some UserID is getting repeated on both the days. I would like to find out the number of repeated UserId every month. For this example the final result set should look like:

Count    Year    Month
0        2014    2
2        2014    3

In the above table, March 2014 has 2 repeated UserID and Feb 2014 has none. I can find out the distinct UserID for each month but not the repeated UserID. Any help in this regard would be much appreciated.

Upvotes: 0

Views: 1100

Answers (4)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125214

select
    count(distinct userid) as "Count",
    extract(year from t0.timestamp) as "Year",
    extract(month from t0.timestamp) as "Month"
from
    t t1
    inner join
    t t0 using (userid)
where t0.timestamp < date_trunc('month', t1.timestamp)
group by 2, 3

or may be faster

select
    count(distinct userid) as "Count",
    extract(year from t0.timestamp) as "Year",
    extract(month from t0.timestamp) as "Month"
from t t1
where exists (
    select 1
    from t
    where
        userid = t1.userid
        and
        timestamp < date_trunc('month', t1.timestamp)
)
group by 2, 3

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269693

Is this what you want?

select yyyymm, sum(case when cnt > 1 then 1 else 0 end) as dupcnt
from (select to_char(timestamp, 'YYYY-MM') as yyyymm, userid, count(*) as cnt
      from table t
      group by to_char(timestamp, 'YYYY-MM'), userid
     ) t
group by yyyymm
order by yyyymm;

Upvotes: 0

dnoeth
dnoeth

Reputation: 60462

To rephrase your question:

How many users are not new (i.e. already visited the shop/website/whatever in a previous month) for each month?

SELECT
   yr, mon,
   COUNT(*) AS all_users,
   COUNT(*) - SUM(repeated) AS new_users,
   SUM(repeated) AS existing_users
FROM
 (
   SELECT UserId, 
      EXTRACT(YEAR FROM TimeStamp) AS yr,
      EXTRACT(MONTH FROM TimeStamp) AS mon,
      CASE WHEN ROW_NUMBER() -- 1st time users get 0
                OVER (PARTITION BY UserId
                      ORDER BY EXTRACT(YEAR FROM TimeStamp) ,
                               EXTRACT(MONTH FROM TimeStamp)) = 1 
           THEN 0 
           ELSE 1
      END AS repeated
   FROM vt
   GROUP BY UserId,
      EXTRACT(YEAR FROM TimeStamp),
      EXTRACT(MONTH FROM TimeStamp)
 ) AS dt
GROUP BY yr,mon
ORDER BY 1,2

The inner GROUP BY is needed if there are multiple rows for a user within the same month.

Upvotes: 0

Shi Wei
Shi Wei

Reputation: 272

This might work, have not tested it out yet.

SELECT 
  COUNT(DISTINCT(UserId))
  , EXTRACT(YEAR FROM TIMESTAMP TimeStamp) AS Year
  , EXTRACT(MONTH FROM TIMESTAMP Timestamp) AS Month 
FROM TABLE 
GROUP BY TimeStamp

Upvotes: 0

Related Questions