John Horton
John Horton

Reputation: 4282

Running total of "matches" using a window function in SQL

I want to create a window function that will count how many times the value of the field in the current row appears in the part of the ordered partition coming before the current row. To make this more concrete, suppose we have a table like so:

| id| fruit  | date |
+---+--------+------+
| 1 | apple  |   1  |
| 1 | cherry |   2  |
| 1 | apple  |   3  |
| 1 | cherry |   4  |
| 2 | orange |   1  |
| 2 | grape  |   2  |
| 2 | grape  |   3  |

And we want to create a table like so (omitting the date column for clarity):

| id| fruit  | prior |  
+---+--------+-------+
| 1 | apple  |   0   |
| 1 | cherry |   0   |
| 1 | apple  |   1   |
| 1 | cherry |   1   |
| 2 | orange |   0   |
| 2 | grape  |   0   |
| 2 | grape  |   1   |

Note that for id = 1, moving along the ordered partition, the first entry 'apple' doesn't match anything (since the implied set is empty), the next fruit, 'cherry' also doesn't match. Then we get to 'apple' again, which is a match and so on. I'm imagining the SQL looks something like this:

SELECT
id, fruit, 
<some kind of INTERSECT?> OVER (PARTITION BY id ORDER by date) AS prior
FROM fruit_table; 

But I cannot find anything that looks right. FWIW, I'm using PostgreSQL 8.4.

Upvotes: 1

Views: 1928

Answers (1)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656371

You could solve that without a window function rather elegantly with a self-left join and a count():

SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM   tbl t
LEFT   JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP  BY t.id, t.day, t.fruit
ORDER  BY t.id, t.day

If your point is to do it with a window function, this one should work:

SELECT id, fruit, day
      ,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM   tbl
ORDER  BY id, day

If frame_end is omitted it defaults to CURRENT ROW.

  • You effectively count how many rows had the same (id, fruit) on prior days - including the current row. That's what the - 1 is for.

Upvotes: 3

Related Questions