psrpsrpsr
psrpsrpsr

Reputation: 457

What is the most performant way to rewrite a correlated subquery in the SELECT clause?

I am trying to count whether a user has visited a site in three time ranges:

I am using Netezza, which does NOT support correlated subqueries in the SELECT clause. See Rextester for successful query that must be re-written to NOT use a correlated subquery: http://rextester.com/JGR62033

Sample Data:

| user_id | last_visit | num_days_since_2017117 |
|---------|------------|------------------------|
| 1234    | 2017-11-02 | 15.6                   |
| 1234    | 2017-09-30 | 48.6                   |
| 1234    | 2017-09-03 | 75.0                   |
| 1234    | 2017-08-21 | 88.0                   |
| 9876    | 2017-10-03 | 45.0                   |
| 9876    | 2017-07-20 | 120.0                  |
| 5545    | 2017-09-15 | 63.0                   |

Desired Output:

| user_id | last_30 | btwn_31_60 | btwn_61_90 |
|---------|---------|------------|------------|
| 1234    | 1       | 1          | 1          |
| 5545    | 0       | 0          | 1          |
| 9876    | 0       | 1          | 0          |

Upvotes: 0

Views: 304

Answers (2)

Aaron Dietz
Aaron Dietz

Reputation: 10277

Here is one way with conditional aggregation, Rextester:

select 
    user_id
    ,MAX(case when '2017-11-17'-visit_date <=30
          then 1
          else 0
     end) as last_30
    ,MAX(case when '2017-11-17'-visit_date >=31
               and '2017-11-17'-visit_date <=60
          then 1
          else 0
     end) as between_31_60
    ,MAX(case when '2017-11-17'-visit_date >=61
              and '2017-11-17'-visit_date <=90
          then 1
          else 0
     end) as between_61_90
from 
    visits
group by user_id
order by user_id

Upvotes: 2

Ken White
Ken White

Reputation: 125728

I don't know the specific DBMS you're using, but if it supports CASE or an equivalent you don't need a correlated sub-query; you can do it with a combination of SUM() and CASE.

Untested in your DBMS, of course, but it should give you a starting point:

SELECT 
  user_id, 
  SUM(CASE WHEN num_days <= 30 then 1 else 0 end) as last_30, 
  SUM(CASE WHEN num_days > 30 AND numdays < 61 then 1 else 0 end) as btwn_31_60,
  SUM(CASE WHEN num_days >= 61 then 1 else 0 end) as btwn_61_90
FROM 
  YourTableName  -- You didn't provide a tablename
GROUP BY 
  user_id 

Since your values are floating point and not integer, you may need to adjust the values used for the day ranges to work with your specific requirements.

Upvotes: 2

Related Questions