Dan Scally
Dan Scally

Reputation: 2042

What's an efficient way to count "previous" rows in SQL?

Hard to phrase the title for this one.

I have a table of data which contains a row per invoice. For example:

| Invoice ID | Customer Key |    Date    | Value | Something |
| ---------- | ------------ | ---------- | ------| --------- |
|     1      |     A        | 08/02/2019 |  100  |     1     |
|     2      |     B        | 07/02/2019 |  14   |     0     |
|     3      |     A        | 06/02/2019 |  234  |     1     |
|     4      |     A        | 05/02/2019 |  74   |     1     |
|     5      |     B        | 04/02/2019 |  11   |     1     |
|     6      |     A        | 03/02/2019 |  12   |     0     |

I need to add another column that counts the number of previous rows per CustomerKey, but only if "Something" is equal to 1, so that it returns this:

| Invoice ID | Customer Key |    Date    | Value | Something | Count |
| ---------- | ------------ | ---------- | ------| --------- | ----- |
|     1      |     A        | 08/02/2019 |  100  |     1     |   2   |
|     2      |     B        | 07/02/2019 |  14   |     0     |   1   |
|     3      |     A        | 06/02/2019 |  234  |     1     |   1   |
|     4      |     A        | 05/02/2019 |  74   |     1     |   0   |
|     5      |     B        | 04/02/2019 |  11   |     1     |   0   |
|     6      |     A        | 03/02/2019 |  12   |     0     |   0   |

I know I can do this using either a CTE like this...

(
select
    count(*)
from table
where
    [Customer Key] = t.[Customer Key]
    and [Date] < t.[Date]
    and Something = 1
)

But I have a lot of data and that's pretty slow. I know I can also use cross apply to achieve the same thing, but as far as I can tell that's not any better performing than just using a CTE.

So; is there a more efficient means of achieving this, or do I just suck it up?

EDIT: I originally posted this without the requirement that only rows where Something = 1 are counted. Mea culpa - I asked it in a hurry. Unfortunately I think that this means I can't use row_number() over (partition by [Customer Key])

Upvotes: 3

Views: 3746

Answers (2)

Thom A
Thom A

Reputation: 95554

Assuming you're using SQL Server 2012+ you can use Window Functions:

COUNT(CASE WHEN Something = 1 THEN CustomerKey END) OVER (PARTITION BY CustomerKey ORDER BY [Date]
                                                          ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]

Old answer before new required logic:

COUNT(CustomerKey) OVER (PARTITION BY CustomerKey ORDER BY [Date]
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -1 AS [Count]

Upvotes: 5

Luis Cazares
Luis Cazares

Reputation: 3585

If you're not using 2012 an alternative is to use ROW_NUMBER

ROW_NUMBER() OVER (PARTITION BY CustomerKey ORDER BY [Date]) - 1 AS Count

Upvotes: 5

Related Questions