Corey
Corey

Reputation: 845

MySQL Group By Consecutive Rows

I have a feed application that I am trying to group results from consecutively. My table looks like this:

    postid | posttype | target | action     |        date         | title      | content
      1    |  userid  |  NULL  | upgrade    | 0000-01-00 00:00:00 | Upgraded 1 | exmple
      1    |  userid  |  NULL  | upgrade    | 0000-01-00 00:00:01 | Upgraded 2 | exmple
      1    |  userid  |  NULL  | downgrade  | 0000-01-00 00:00:02 | Downgraded | exmple
      1    |  userid  |  NULL  | upgrade    | 0000-01-00 00:00:03 | Upgraded   | exmple

What I would like the outcome to be is:

    postid | posttype | target | action     |        date         | title      | content
      1    |  userid  |  NULL  | upgrade    | 0000-01-00 00:00:01 | Upgrade 1  | exmple,exmple
      1    |  userid  |  NULL  | downgrade  | 0000-01-00 00:00:02 | Downgraded | exmple
      1    |  userid  |  NULL  | upgrade    | 0000-01-00 00:00:03 | Upgraded   | exmple

So as you can see because Upgrade 1 & Upgrade 2 were sent Consecutively, it groups them together. The "Action" table is a reference, and should be used for the consecutive grouping as well as the postid & posttype.

I looked around on SO but didnt see anything quite like mine. Thanks in advance for any help.

Upvotes: 5

Views: 3407

Answers (2)

DRapp
DRapp

Reputation: 48139

Here's another version that works with MySQL Variables and doesn't require 3 level nesting deep. The first one pre-sorts the records in order by postID and Date and assigns them a sequential number per group whenever any time a value changes in one of the Post ID, Type and/or action. From that, Its a simple group by... no comparing record version T to T2 to T3... what if you wanted 4 or 5 criteria... would you have to nest even more entries?, or just add 2 more @sql variables to the comparison test...

Your call on which is more efficient...

select
      PreQuery.postID,
      PreQuery.PostType,
      PreQuery.Target,
      PreQuery.Action,
      PreQuery.Title,
      min( PreQuery.Date ) as FirstActionDate,
      max( PreQuery.Date ) as LastActionDate,
      count(*) as ActionEntries,
      group_concat( PreQuery.content ) as Content
   from
      ( select
              t.*,
              @lastSeq := if( t.action = @lastAction
                          AND t.postID = @lastPostID
                          AND t.postType = @lastPostType, @lastSeq, @lastSeq +1 ) as ActionSeq,
              @lastAction := t.action,
              @lastPostID := t.postID,
              @lastPostType := t.PostType
           from
              t,
              ( select @lastAction := ' ',
                       @lastPostID := 0,
                       @lastPostType := ' ',
                       @lastSeq := 0 ) sqlVars
           order by
              t.postid,
              t.date ) PreQuery
   group by
      PreQuery.postID,
      PreQuery.ActionSeq,
      PreQuery.PostType,
      PreQuery.Action    

Here's my link to SQLFiddle sample

For the title, you might want to adjust the line...

group_concat( distinct PreQuery.Title ) as Titles,

At least this will give DISTINCT titles concatinated... much tougher to get let without nesting this entire query one more level by having the max query date and other elements to get the one title associated with that max date per all criteria.

Upvotes: 5

AndreKR
AndreKR

Reputation: 33678

There is no primary key in your table so for my example I used date. You should create an auto increment value and use that instead of the date in my example.

This is a solution (view on SQL Fiddle):

SELECT 
  postid,
  posttype,
  target,
  action,
  COALESCE((
    SELECT date
    FROM t t2
    WHERE t2.postid = t.postid
    AND t2.posttype = t.posttype
    AND t2.action = t.action
    AND t2.date > t.date
    AND NOT EXISTS (
      SELECT TRUE
      FROM t t3
      WHERE t3.date > t.date
      AND t3.date < t2.date
      AND (t3.postid != t.postid OR t3.posttype != t.posttype OR t3.action != t.action)
    )
  ), t.date) AS group_criterion,
  MAX(title),
  GROUP_CONCAT(content)
FROM t
GROUP BY 1,2,3,4,5
ORDER BY group_criterion

It basically reads:

For each row create a group criterion and in the end group by it.
This criterion is the highest date of the rows following the current one and having the same postid, posttype and action as the current one but there may be not a row of different postid, posttype or action between them.
In other words, the group criterion is the highest occurring date in a group of consecutive entries.

If you use proper indexes it shouldn't be terribly slow but if you have a lot of rows you should think of caching this information.

Upvotes: 1

Related Questions