Reputation: 1
I have the following table in SQL-
user_id purchase_date revenues
-------------------------------------
A 2018-01-01 30
A 2018-01-02 28
A 2018-01-04 45
A 2018-01-06 55
B 2018-01-03 20
B 2018-01-05 21
And I have to select only the rows in which the revenue from a user was greater than the revenue on their first day.
Meaning the expected result should be-
user_id purchase_date revenues
-------------------------------------
A 2018-01-04 45
A 2018-01-06 55
B 2018-01-05 21
I'm just starting to learn SQL, and I can do each query (by date or revenue per user) on its own, but I can't seem to get the hang of it with this complexity.
Thanks a lot!
Upvotes: 0
Views: 1858
Reputation: 4474
A cross apply
will make your life a lot easier here. This is probably the "easiest to understand" (but not the most efficient) way of writing the query, because it aligns very naturally with the "in plain english" way of writing the question.
The cross apply
operator lets you write a select statement, and then "apply" that select statement to each row on the left hand side.
In "plain English" (but written using more SQL-ish terms), you want
the rows where the revenues for a user_id is greater than the revenues for that user_id on their top 1 purchase_date when ordered by date ascending.
Let's tackle the second part of that. For a user_id, get the top 1 revenues ordered by purchase_date ascending:
select top 1 revenues
from MyTable
where user_id = -- [just one particular user id]
order by purchase_date asc
OK, but we need that for every user. So we use cross apply:
select t.user_id,
t.purchase_date,
t.revenues,
first_day_revenues_by_user.revenues
from MyTable t
cross apply (
select top 1 revenues
from MyTable
where user_id = t.user_id -- use the user_id from the outer select
order by purchase_date asc
)
as first_day_revenues_by_user -- give the subquery an alias we can use
Now we have a select statement which has all the columns of the original table, plus an additional column containing the revenues
value for their first day. So now we can add a simple where
clause:
select t.user_id,
t.purchase_date,
t.revenues
from MyTable t
cross apply (
select top 1 revenues
from MyTable
where user_id = t.user_id
order by purchase_date asc
)
as first_day_revenues_by_user
where t.revenues > first_day_revenues_by_user.revenues
Upvotes: 0
Reputation: 2216
You can use windowed function ROW_NUMBER()
with A as
(
select user_id, purchase_date, revenues
from
(values
('A', '2018-01-01', 30),
('A', '2018-01-02', 28),
('A', '2018-01-04', 45),
('A', '2018-01-06', 55),
('B', '2018-01-03', 20),
('B', '2018-01-05', 21)) as T(user_id, purchase_date, revenues)
),
B as (
select
ROW_NUMBER() over (partition by user_id order by purchase_date) as n,
user_id, purchase_date, revenues from A
),
C as (
select * from B where n = 1
)
select
A.user_id, A.purchase_date, A.revenues
from
A inner join C on (A.user_id = C.user_id and A.revenues > C.revenues)
also you can use FIRST_VALUE function
with A as
(
select user_id, purchase_date, revenues
from
(values
('A', '2018-01-01', 30),
('A', '2018-01-02', 28),
('A', '2018-01-04', 45),
('A', '2018-01-06', 55),
('B', '2018-01-03', 20),
('B', '2018-01-05', 21)) as T(user_id, purchase_date, revenues)
),
B as (
select
user_id, purchase_date, revenues,
FIRST_VALUE(revenues) over (partition by user_id order by purchase_date) as n
from A
)
select user_id, purchase_date, revenues from B where revenues > n;
Upvotes: 2