Reputation: 8376
I am composing a complex query in T-SQL. The dataset I have is a single table (let's call it Table T
), and the data look like
| Happened | Contributor | Status | Direction | Purchased |
|-------------------------|-------------|--------|-----------|-----------|
| 2014-10-06 01:00:00.000 | A | 0 | NULL | NULL |
| 2014-10-06 02:00:00.000 | A | 1 | NULL | NULL |
| 2014-10-06 03:00:00.000 | A | 2 | inbound | NULL |
| 2014-10-06 04:00:00.000 | A | 0 | NULL | yes |
| 2014-10-06 05:00:00.000 | A | 2 | outbound | yes |
| 2014-10-06 06:00:00.000 | B | 2 | inbound | NULL |
So I have
T.Happened
) and a subject who caused an event (T.Contributor
);What I need is a query that, for each distinct contributor, selects
T.Direction = 'inbound'
;T.Purchased = 'yes'
and status is not 0.and displays the whole set of contributors even if other fields in the row are empty.
What I tried were multiple joins, just like
...
FROM Table T
JOIN Table T2 ON
(T.Contributor = T2.Contributor
AND T.Happened < T2.Happened
AND T2.Status = 1
AND T1.Status = 0)
JOIN Table T3 ON
...
The result dataset I'm after should look like
| Contributor | StatusChangedFrom0To1 | StatusWasNot0AndDirectionWasInbound | StatusWasNot0AndPuchasedWasYes |
|-------------|-------------------------|-------------------------------------|--------------------------------|
| A | 2014-10-06 02:00:00.000 | 2014-10-06 03:00:00.000 | 2014-10-06 05:00:00.000 |
| B | NULL | 2014-10-06 06:00:00.000 | NULL |
What approach should I follow and what direction should I dig to get the result desired? Should I use certain type of join (e.g. full outer join)?
I use MS SQL Server 2008, and I'm tied to this version for, you know, "enterprise and stuff" reason, so upgrade to any newer version is unlikely.
Upvotes: 0
Views: 64
Reputation: 1269953
If I assume that the statuses only increase, then the earliest time the status changed from 0 to 1 is the earliest time the status is 1 (this is true of the sample data in your question):
select contributor,
min(case when status = 1 then happened end) as StatusChangedFrom0To1,
min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from table t
group by contributor;
If the assumption for StatusChangedFrom0To1
is not true, the query is still possible as a conditional aggregation, but it will require additional work to get this variable. In SQL Server 2012+, you can use lag()
for this purpose:
select contributor,
min(case when status = 1 and prevstatus = 0 then happened end) as StatusChangedFrom0To1,
min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from (select t.*, lag(status) over (partition by Contributor order by happened) as prevstatus
from table t
) t
group by contributor;
In earlier versions, I would use a correlated subquery for equivalent functionality.
EDIT:
The correlated subquery looks like:
from (select t.*,
(select top 1 t2.status
from table t2
where t.contributor = t2.contributor and t2.happened < t.happened
order by t2.happened desc
) as prevstatus
from table t
) t
Upvotes: 1