user39676
user39676

Reputation: 5

GROUP BY specified number of rows

I have a table of data which is always sorted by date. I want to group lets say 10 000 LAST rows by 1 000. If I have lets say 15200 rows, 10 groups should consist of rows 5201-6200, 6201-7200, ... ,13201-14200 , 14201-15200. Rows in a table are not numerated. I need this to find sum of values in each of 10 groups

SELECT SUM(quantity)
FROM dataTable
GROUP BY ???

Upvotes: 0

Views: 226

Answers (2)

Thorsten Kettner
Thorsten Kettner

Reputation: 95043

First: How to get the last 10000 rows? Sort by date and cut after 10000st row.

select quantity
from datatable
order by thedate desc
limit 10000;

Then how to create row numbers which you can build groups with? Cross join the result with an incrementing variable. (Here I must rely on the fact that MySQL increments the variable after sorting the rows. This is no guaranteed behavior. In fact, I haven't found anything on this in the docs. Maybe they will gurantantee this one day, because many people already rely on this behavior.)

select @rownum := @rownum + 1 as rn, q.quantity
from
(
  select quantity, thedate
  from datatable
  order by thedate desc
  limit 10000
) q
cross join (select @rownum := 0) r
order by thedate desc;

At last build groups, by dividing the row number by 1000:

select truncate( (rn-1) / 1000, 0 ) as groupno, sum(quantity)
from
(
  select @rownum := @rownum + 1 as rn, q.quantity
  from
  (
    select quantity, thedate
    from datatable
    order by thedate desc
    limit 10000
  ) q
  cross join (select @rownum := 0) r
  order by thedate desc
)
group by groupno;

Upvotes: 0

spencer7593
spencer7593

Reputation: 108450

You can use an inline view and user-defined variables to emulate a ROW_NUMBER analytic function, and then "group by" an expression that uses ROW_NUMBER to derive a common value for each "group" of rows.

Something like this, for example:

SELECT SUM(v.quantity) AS sum_quantity
  FROM (
         SELECT d.quantity
              , @rn := @rn + 1 AS rn
           FROM dataTable d
          CROSS
           JOIN (SELECT @rn := -1) i
          ORDER BY d.quantity
          LIMIT 10000
       ) v
 GROUP BY v.rn DIV 1000

The inline view (assigned an alias of v) assigns a "row number" to each row from dataTable. (The inline view aliased as i initializes @rn, a MySQL user-defined variable. We don't really care what that query returns, except that we need the query to return exactly one row because of the JOIN operation; what we really care about is that initializes the user-defined variable, before the rest of the statement runs.

As each row is processed, @rn is incremented by 1, and the current value is returned as a "row number" column (assigned an alias of rn) in the resultset.

(NOTE: we chose to intialize @rn to -1 rather than 0, so that the rn column will be returned with values starting at 0. This will be more convenient than starting rn values at 1, when we later "group" the rows into groups of 1000 using an integer division operation.)

The ORDER BY clause is optional; but if there's a particular "order" by which you determine which 10,000 rows are the "last" row, you'd specify that in the ORDER BY clause. (I specified the quantity column here, because that's the only column I know about in the table... I don't have any information about what identifies whether a row is "before" or "after" some other row.)

The LIMIT clause limits the number of rows returned, so we only get 10,000 rows. This will be the "first" 10,000 rows returned from the query, whatever is specified. (To get the rows in reverse order, so the "last" rows are returned first, add the DESC keyword to the ORDER BY clause.)

The GROUP BY on the outer query uses "integer division" operator. With that expression, rn values 0 thru 999 evaluate to 0, rn values 1000 thru 1999 evaluate to 1, etc.

You can run just the inline view query, to see how the row number is being assigned.

You can add additional expressions to the SELECT list in the outer query, to demonstrate how the statement is working, e.g.

SELECT SUM(v.quantity)    AS sum_quantity
     , SUM(1)             AS row_count
     , MIN(v.rn)          AS rn_min
     , MAX(v.rn)          AS rn_max
     , MAX(v.rn) DIV 1000 AS rn_div_1000

Upvotes: 1

Related Questions