Reputation: 5
I have a table of data which is always sorted by date. I want to group lets say 10 000 LAST rows by 1 000. If I have lets say 15200 rows, 10 groups should consist of rows 5201-6200, 6201-7200, ... ,13201-14200 , 14201-15200. Rows in a table are not numerated. I need this to find sum of values in each of 10 groups
SELECT SUM(quantity)
FROM dataTable
GROUP BY ???
Upvotes: 0
Views: 226
Reputation: 95043
First: How to get the last 10000 rows? Sort by date and cut after 10000st row.
select quantity
from datatable
order by thedate desc
limit 10000;
Then how to create row numbers which you can build groups with? Cross join the result with an incrementing variable. (Here I must rely on the fact that MySQL increments the variable after sorting the rows. This is no guaranteed behavior. In fact, I haven't found anything on this in the docs. Maybe they will gurantantee this one day, because many people already rely on this behavior.)
select @rownum := @rownum + 1 as rn, q.quantity
from
(
select quantity, thedate
from datatable
order by thedate desc
limit 10000
) q
cross join (select @rownum := 0) r
order by thedate desc;
At last build groups, by dividing the row number by 1000:
select truncate( (rn-1) / 1000, 0 ) as groupno, sum(quantity)
from
(
select @rownum := @rownum + 1 as rn, q.quantity
from
(
select quantity, thedate
from datatable
order by thedate desc
limit 10000
) q
cross join (select @rownum := 0) r
order by thedate desc
)
group by groupno;
Upvotes: 0
Reputation: 108450
You can use an inline view and user-defined variables to emulate a ROW_NUMBER analytic function, and then "group by" an expression that uses ROW_NUMBER to derive a common value for each "group" of rows.
Something like this, for example:
SELECT SUM(v.quantity) AS sum_quantity
FROM (
SELECT d.quantity
, @rn := @rn + 1 AS rn
FROM dataTable d
CROSS
JOIN (SELECT @rn := -1) i
ORDER BY d.quantity
LIMIT 10000
) v
GROUP BY v.rn DIV 1000
The inline view (assigned an alias of v
) assigns a "row number" to each row from dataTable. (The inline view aliased as i
initializes @rn
, a MySQL user-defined variable. We don't really care what that query returns, except that we need the query to return exactly one row because of the JOIN
operation; what we really care about is that initializes the user-defined variable, before the rest of the statement runs.
As each row is processed, @rn is incremented by 1, and the current value is returned as a "row number" column (assigned an alias of rn) in the resultset.
(NOTE: we chose to intialize @rn
to -1 rather than 0, so that the rn
column will be returned with values starting at 0. This will be more convenient than starting rn
values at 1, when we later "group" the rows into groups of 1000 using an integer division operation.)
The ORDER BY
clause is optional; but if there's a particular "order" by which you determine which 10,000 rows are the "last" row, you'd specify that in the ORDER BY clause. (I specified the quantity
column here, because that's the only column I know about in the table... I don't have any information about what identifies whether a row is "before" or "after" some other row.)
The LIMIT
clause limits the number of rows returned, so we only get 10,000 rows. This will be the "first" 10,000 rows returned from the query, whatever is specified. (To get the rows in reverse order, so the "last" rows are returned first, add the DESC
keyword to the ORDER BY clause.)
The GROUP BY
on the outer query uses "integer division" operator. With that expression, rn
values 0 thru 999 evaluate to 0, rn values 1000 thru 1999 evaluate to 1, etc.
You can run just the inline view query, to see how the row number is being assigned.
You can add additional expressions to the SELECT list in the outer query, to demonstrate how the statement is working, e.g.
SELECT SUM(v.quantity) AS sum_quantity
, SUM(1) AS row_count
, MIN(v.rn) AS rn_min
, MAX(v.rn) AS rn_max
, MAX(v.rn) DIV 1000 AS rn_div_1000
Upvotes: 1