GROUP BY specified number of rows

Question

I have a table of data which is always sorted by date. I want to group lets say 10 000 LAST rows by 1 000. If I have lets say 15200 rows, 10 groups should consist of rows 5201-6200, 6201-7200, ... ,13201-14200 , 14201-15200. Rows in a table are not numerated. I need this to find sum of values in each of 10 groups

SELECT SUM(quantity)
FROM dataTable
GROUP BY ???

spencer7593 · Accepted Answer

You can use an inline view and user-defined variables to emulate a ROW_NUMBER analytic function, and then "group by" an expression that uses ROW_NUMBER to derive a common value for each "group" of rows.

Something like this, for example:

SELECT SUM(v.quantity) AS sum_quantity
  FROM (
         SELECT d.quantity
              , @rn := @rn + 1 AS rn
           FROM dataTable d
          CROSS
           JOIN (SELECT @rn := -1) i
          ORDER BY d.quantity
          LIMIT 10000
       ) v
 GROUP BY v.rn DIV 1000

The inline view (assigned an alias of v) assigns a "row number" to each row from dataTable. (The inline view aliased as i initializes @rn, a MySQL user-defined variable. We don't really care what that query returns, except that we need the query to return exactly one row because of the JOIN operation; what we really care about is that initializes the user-defined variable, before the rest of the statement runs.

As each row is processed, @rn is incremented by 1, and the current value is returned as a "row number" column (assigned an alias of rn) in the resultset.

(NOTE: we chose to intialize @rn to -1 rather than 0, so that the rn column will be returned with values starting at 0. This will be more convenient than starting rn values at 1, when we later "group" the rows into groups of 1000 using an integer division operation.)

The ORDER BY clause is optional; but if there's a particular "order" by which you determine which 10,000 rows are the "last" row, you'd specify that in the ORDER BY clause. (I specified the quantity column here, because that's the only column I know about in the table... I don't have any information about what identifies whether a row is "before" or "after" some other row.)

The LIMIT clause limits the number of rows returned, so we only get 10,000 rows. This will be the "first" 10,000 rows returned from the query, whatever is specified. (To get the rows in reverse order, so the "last" rows are returned first, add the DESC keyword to the ORDER BY clause.)

The GROUP BY on the outer query uses "integer division" operator. With that expression, rn values 0 thru 999 evaluate to 0, rn values 1000 thru 1999 evaluate to 1, etc.

You can run just the inline view query, to see how the row number is being assigned.

You can add additional expressions to the SELECT list in the outer query, to demonstrate how the statement is working, e.g.

SELECT SUM(v.quantity)    AS sum_quantity
     , SUM(1)             AS row_count
     , MIN(v.rn)          AS rn_min
     , MAX(v.rn)          AS rn_max
     , MAX(v.rn) DIV 1000 AS rn_div_1000

GROUP BY specified number of rows

Answers (2)

Related Questions