rishat
rishat

Reputation: 8376

Multiple joins of the same dataset

I am composing a complex query in T-SQL. The dataset I have is a single table (let's call it Table T), and the data look like

| Happened                | Contributor | Status | Direction | Purchased |
|-------------------------|-------------|--------|-----------|-----------|
| 2014-10-06 01:00:00.000 | A           | 0      | NULL      | NULL      |
| 2014-10-06 02:00:00.000 | A           | 1      | NULL      | NULL      |
| 2014-10-06 03:00:00.000 | A           | 2      | inbound   | NULL      |
| 2014-10-06 04:00:00.000 | A           | 0      | NULL      | yes       |
| 2014-10-06 05:00:00.000 | A           | 2      | outbound  | yes       |
| 2014-10-06 06:00:00.000 | B           | 2      | inbound   | NULL      |

So I have

What I need is a query that, for each distinct contributor, selects

and displays the whole set of contributors even if other fields in the row are empty.

What I tried were multiple joins, just like

...
FROM Table T
JOIN Table T2 ON
    (T.Contributor = T2.Contributor
    AND T.Happened < T2.Happened
    AND T2.Status = 1
    AND T1.Status = 0)
JOIN Table T3 ON
...

The result dataset I'm after should look like

| Contributor | StatusChangedFrom0To1   | StatusWasNot0AndDirectionWasInbound | StatusWasNot0AndPuchasedWasYes |
|-------------|-------------------------|-------------------------------------|--------------------------------|
| A           | 2014-10-06 02:00:00.000 | 2014-10-06 03:00:00.000             | 2014-10-06 05:00:00.000        |
| B           | NULL                    | 2014-10-06 06:00:00.000             | NULL                           |

What approach should I follow and what direction should I dig to get the result desired? Should I use certain type of join (e.g. full outer join)?

I use MS SQL Server 2008, and I'm tied to this version for, you know, "enterprise and stuff" reason, so upgrade to any newer version is unlikely.

Upvotes: 0

Views: 64

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269953

If I assume that the statuses only increase, then the earliest time the status changed from 0 to 1 is the earliest time the status is 1 (this is true of the sample data in your question):

select contributor,
       min(case when status = 1 then happened end) as StatusChangedFrom0To1,
       min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
       min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
       sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from table t
group by contributor;

If the assumption for StatusChangedFrom0To1 is not true, the query is still possible as a conditional aggregation, but it will require additional work to get this variable. In SQL Server 2012+, you can use lag() for this purpose:

select contributor,
       min(case when status = 1 and prevstatus = 0 then happened end) as StatusChangedFrom0To1,
       min(case when status <> 0 and direction = 'inbound' then happened end) as StatusWasNot0AndDirectionWasInbound,
       min(case when status <> 0 and purchased = 'yes' then happened end) as StatusWasNot0AndPuchasedWasYes,
       sum(case when status <> 0 and purchased = 'yes' then 1 else 0 end) as cnt
from (select t.*, lag(status) over (partition by Contributor order by happened) as prevstatus
      from table t
     ) t
group by contributor;

In earlier versions, I would use a correlated subquery for equivalent functionality.

EDIT:

The correlated subquery looks like:

from (select t.*,
             (select top 1 t2.status
              from table t2
              where t.contributor = t2.contributor and t2.happened < t.happened
              order by t2.happened desc
             ) as prevstatus
      from table t
     ) t

Upvotes: 1

Related Questions