Fábio Antunes
Fábio Antunes

Reputation: 17164

Select number of rows for each group where two column values makes one group

I have a two select statements joined by UNION ALL. In the first statement a where clause gathers only rows that have been shown previously to the user. The second statement gathers all rows that haven't been shown to the user, therefore I end up with the viewed results first and non-viewed results after.

Of course this could simply be achieved with the same select statement using a simple ORDER BY, however the reason for two separate selects is simple after you realize what I hope to accomplish.

Consider the following structure and data.

+----+------+-----+--------+------+
| id | from | to  | viewed | data |
+----+------+-----+--------+------+
| 1  | 1    | 10  | true   | .... |
| 2  | 10   | 1   | true   | .... |
| 3  | 1    | 10  | true   | .... |
| 4  | 6    | 8   | true   | .... |
| 5  | 1    | 10  | true   | .... |
| 6  | 10   | 1   | true   | .... |
| 7  | 8    | 6   | true   | .... |
| 8  | 10   | 1   | true   | .... |
| 9  | 6    | 8   | true   | .... |
| 10 | 2    | 3   | true   | .... |
| 11 | 1    | 10  | true   | .... |
| 12 | 8    | 6   | true   | .... |
| 13 | 10   | 1   | false  | .... |
| 14 | 1    | 10  | false  | .... |
| 15 | 6    | 8   | false  | .... |
| 16 | 10   | 1   | false  | .... |
| 17 | 8    | 6   | false  | .... |
| 18 | 3    | 2   | false  | .... |
+----+------+-----+--------+------+

Basically I wish all non viewed rows to be selected by the statement, that is accomplished by checking weather the viewed column is true or false, pretty simple and straightforward, nothing to worry here.

However when it comes to the rows already viewed, meaning the column viewed is TRUE, for those records I only want 3 rows to be returned for each group.

The appropriate result in this instance should be the 3 most recent rows of each group.

+----+------+-----+--------+------+
| id | from | to  | viewed | data |
+----+------+-----+--------+------+
| 6  | 10   | 1   | true   | .... |
| 7  | 8    | 6   | true   | .... |
| 8  | 10   | 1   | true   | .... |
| 9  | 6    | 8   | true   | .... |
| 10 | 2    | 3   | true   | .... |
| 11 | 1    | 10  | true   | .... |
| 12 | 8    | 6   | true   | .... |
+----+------+-----+--------+------+

As you see from the ideal result set we have three groups. Therefore the desired query for the viewed results should show a maximum of 3 rows for each grouping it finds. In this case these groupings were 10 with 1 and 8 with 6, both which had three rows to be shown, while the other group 2 with 3 only had one row to be shown.

Please note that where from = x and to = y, makes the same grouping as if it was from = y and to = x. Therefore considering the first grouping (10 with 1), from = 10 and to = 1 is the same group if it was from = 1 and to = 10.

However there are plenty of groups in the whole table that I only wish the 3 most recent of each to be returned in the select statement, and thats my problem, I not sure how that can be accomplished in the most efficient way possible considering the table will have hundreds if not thousands of records at some point.

Thanks for your help.

Note: The columns id, from, to and viewed are indexed, that should help with performance.

PS: I'm unsure on how to name this question exactly, if you have a better idea, be my guest and edit the title.

Upvotes: 0

Views: 1858

Answers (1)

O. Jones
O. Jones

Reputation: 108641

What a hairball! This gets progressively harder as you move from most recent, to second most recent, to third most recent.

Let's put this together by getting the list of IDs we need. Then we can pull the items from the table by ID.

This, relatively easy, query gets you the ids of your most recent items

 SELECT id FROM
    (SELECT max(id) id, fromitem, toitem
       FROM stuff
      WHERE viewed = 'true'
      GROUP BY fromitem, toitem
    )a

Fiddle: http://sqlfiddle.com/#!2/f7045/27/0

Next, we need to get the ids of the second most recent items. To do this, we need a self-join style query. We need to do the same summary but on a virtual table that omits the most recent items.

select id from (
  select max(b.id) id, b.fromitem, b.toitem
    from stuff a
    join
           (select id, fromitem, toitem
            from stuff
           where viewed = 'true'
            ) b on (    a.fromitem = b.fromitem 
                    and a.toitem = b.toitem
                    and b.id < a.id)
   where a.viewed = 'true'
   group by fromitem, toitem
  )c

Fiddle: http://sqlfiddle.com/#!2/f7045/44/0

Finally, we need to get the ids of the third most recent items. Mercy! We need to join that query we just had, to the table again.

select id from
(
  select max(d.id) id, d.fromitem, d.toitem
    from stuff d
     join 
    (
       select max(b.id) id, b.fromitem, b.toitem
          from stuff a
          join
            (
               select id, fromitem, toitem
                 from stuff
                where viewed = 'true'
            ) b on  (    a.fromitem = b.fromitem 
                     and a.toitem = b.toitem
                     and b.id < a.id)
          where a.viewed = 'true'
          group by fromitem, toitem
     ) c on (    d.fromitem = c.fromitem
             and d.toitem = c.toitem
             and d.id < c.id)
    where d.viewed='true'
  group by d.fromitem, d.toitem
 ) e

Fiddle: http://sqlfiddle.com/#!2/f7045/45/0

So, now we take the union of all those ids, and use them to grab the right rows from the table, and we're done.

SELECT * 
  FROM STUFF
 WHERE ID IN
(

SELECT id FROM
    (SELECT max(id) id, fromitem, toitem
       FROM stuff
      WHERE viewed = 'true'
      GROUP BY fromitem, toitem
    )a
UNION
select id from (
  select max(b.id) id, b.fromitem, b.toitem
    from stuff a
    join
           (select id, fromitem, toitem
            from stuff
           where viewed = 'true'
            ) b on (    a.fromitem = b.fromitem 
                    and a.toitem = b.toitem
                    and b.id < a.id)
   where a.viewed = 'true'
   group by fromitem, toitem
  )c
UNION
select id from
(
  select max(d.id) id, d.fromitem, d.toitem
    from stuff d
     join 
    (
       select max(b.id) id, b.fromitem, b.toitem
          from stuff a
          join
            (
               select id, fromitem, toitem
                 from stuff
                where viewed = 'true'
            ) b on  (    a.fromitem = b.fromitem 
                     and a.toitem = b.toitem
                     and b.id < a.id)
          where a.viewed = 'true'
          group by fromitem, toitem
     ) c on (    d.fromitem = c.fromitem
             and d.toitem = c.toitem
             and d.id < c.id)
    where d.viewed='true'
  group by d.fromitem, d.toitem
 ) e
UNION
select id from stuff where viewed='false'
)
order by viewed desc, fromitem, toitem, id desc

Tee hee. Too much SQL. Fiddle: http://sqlfiddle.com/#!2/f7045/47/0

And now, we need to cope with your last requirement, the requirement that your graph is unordered. That is, that from=n to=m is the same as from=m to=n.

To do this we need a virtual table instead of the physical table. This will do the trick.

 SELECT id, least(fromitem, toitem) fromitem, greatest(fromitem,toitem) toitem, data
   FROM stuff

Now we need to use this virtual table, this view, everywhere the physical table used to appear. Let's use a view to do this.

CREATE VIEW 
AS 
SELECT id,
       LEAST(fromitem, toitem) fromitem,
       GREATEST (fromitem, toitem) toitem,
       viewed,
       data;

So, our ultimate query is:

SELECT *
      FROM stuff
     WHERE ID IN
    (

    SELECT id FROM
        (SELECT max(id) id, fromitem, toitem
           FROM STUFF_UNORDERED
          WHERE viewed = 'true'
          GROUP BY fromitem, toitem
        )a
    UNION
    SELECT id FROM (
      SELECT max(b.id) id, b.fromitem, b.toitem
        FROM STUFF_UNORDERED a
        JOIN
               (SELECT id, fromitem, toitem
                FROM STUFF_UNORDERED
               WHERE viewed = 'true'
                ) b ON (    a.fromitem = b.fromitem
                        AND a.toitem = b.toitem
                        AND b.id < a.id)
       WHERE a.viewed = 'true'
       GROUP BY fromitem, toitem
      )c
    UNION
    SELECT id FROM
    (
      SELECT max(d.id) id, d.fromitem, d.toitem
        FROM STUFF_UNORDERED d
         JOIN
        (
           SELECT max(b.id) id, b.fromitem, b.toitem
              FROM STUFF_UNORDERED a
              JOIN
                (
                   SELECT id, fromitem, toitem
                     FROM STUFF_UNORDERED
                    WHERE viewed = 'true'
                ) b ON  (    a.fromitem = b.fromitem
                         AND a.toitem = b.toitem
                         AND b.id < a.id)
              WHERE a.viewed = 'true'
              GROUP BY fromitem, toitem
         ) c ON (    d.fromitem = c.fromitem
                 AND d.toitem = c.toitem
                 AND d.id < c.id)
        WHERE d.viewed='true'
      GROUP BY d.fromitem, d.toitem
     ) e
    UNION
    SELECT id FROM STUFF_UNORDERED WHERE viewed='false'
    )
    ORDER BY viewed DESC,
            least(fromitem, toitem),
            greatest(fromitem, toitem),
            id DESC

Fiddle: http://sqlfiddle.com/#!2/8c154/4/0

Upvotes: 3

Related Questions