Running total of "matches" using a window function in SQL

Question

I want to create a window function that will count how many times the value of the field in the current row appears in the part of the ordered partition coming before the current row. To make this more concrete, suppose we have a table like so:

| id| fruit  | date |
+---+--------+------+
| 1 | apple  |   1  |
| 1 | cherry |   2  |
| 1 | apple  |   3  |
| 1 | cherry |   4  |
| 2 | orange |   1  |
| 2 | grape  |   2  |
| 2 | grape  |   3  |

And we want to create a table like so (omitting the date column for clarity):

| id| fruit  | prior |  
+---+--------+-------+
| 1 | apple  |   0   |
| 1 | cherry |   0   |
| 1 | apple  |   1   |
| 1 | cherry |   1   |
| 2 | orange |   0   |
| 2 | grape  |   0   |
| 2 | grape  |   1   |

Note that for id = 1, moving along the ordered partition, the first entry 'apple' doesn't match anything (since the implied set is empty), the next fruit, 'cherry' also doesn't match. Then we get to 'apple' again, which is a match and so on. I'm imagining the SQL looks something like this:

SELECT
id, fruit, 
 OVER (PARTITION BY id ORDER by date) AS prior
FROM fruit_table;

But I cannot find anything that looks right. FWIW, I'm using PostgreSQL 8.4.

Erwin Brandstetter · Accepted Answer

You could solve that without a window function rather elegantly with a self-left join and a count():

SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM   tbl t
LEFT   JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP  BY t.id, t.day, t.fruit
ORDER  BY t.id, t.day

I renamed the date column day because date is a reserved word in every SQL standard and in PostgreSQL.
I corrected a mistake in your sample data. They way you had it, it did not check out. Might confuse people.

If your point is to do it with a window function, this one should work:

SELECT id, fruit, day
      ,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM   tbl
ORDER  BY id, day

This works, because, I quote the manual:

If frame_end is omitted it defaults to CURRENT ROW.

You effectively count how many rows had the same (id, fruit) on prior days - including the current row. That's what the - 1 is for.

Running total of "matches" using a window function in SQL

Answers (1)

Related Questions

Running total of &quot;matches&quot; using a window function in SQL

Answers (1)

Related Questions

Running total of "matches" using a window function in SQL