gidgicken
gidgicken

Reputation: 21

PostgreSQL - Filter column 2 results based on column 1

Forgive a novice question. I am new to postgresql.

I have a database full of transactional information. My goal is to iterate through each day since the first transaction, and show how many unique users made a purchase on that day, or in the 30 days previous to that day.

So the # of unique users on 02/01/2016 should show all unique users from 01/01/2016 through 02/01/2016. The # of unique users on 02/02/2016 should show all unique users from 01/02/2016 through 02/02/2016.

Here is a fiddle with some sample data: http://sqlfiddle.com/#!15/b3d90/1

The result should be something like this:

December 17 2014 -- 1
December 18 2014 -- 2
December 19 2014 -- 3
...
January 13 2015 -- 16
January 19 2015 -- 15
January 20 2015 -- 15
...

The best I've come up with is the following:

SELECT
to_char(S.created, 'YYYY-MM-DD') AS my_day,
COUNT(DISTINCT 
    CASE
      WHEN S.created > S.created - INTERVAL '30 days'
      THEN S.user_id
      END)
FROM
    transactions S
GROUP BY my_day
ORDER BY my_day;

As you can see, I have no idea how I could reference what exists in column one in order to specify what date range should be included in the filter.

Any help would be much appreciated!

Upvotes: 1

Views: 55

Answers (1)

Hambone
Hambone

Reputation: 16377

I think if you do a self-join, it would give you the results you seek:

select
  t1.created,
  count (distinct t2.user_id)
from
  transactions t1
  join transactions t2 on
    t2.created between t1.created - interval '30 days' and t1.created
group by
  t1.created
order by
  t1.created

That said, I think this is going to do form of a cartesian join in the background, so for large datasets I doubt it's very efficient. If you run into huge performance problems, there are ways to make this a lot faster... but before you address that, find out if you need to.

-- EDIT 8/20/16 --

In response to your issue with the performance of this... yes, it's a pig. I admit it. I encountered a similar issue here:

PostgreSQL Joining Between Two Values

The same concept for your example is this:

with xtrans as (
  select created, created + generate_series(0, 30) as create_range, user_id
  from transactions
)
select
  t1.created,
  count (distinct t2.user_id)
from
  transactions t1
  join xtrans t2 on
    t2.create_range = t1.created
group by
  t1.created
order by
  t1.created

It's not as easy to follow, but it should yield identical results, only it will be significantly faster because it's not doing the "glorified cross join."

Upvotes: 1

Related Questions