Reputation: 1391
I have a table (MySQL) that has a column called binID. The values in this column can range from 1 to 70.
What I want to do is select the unique values of this column (should be the numbers from 1 to 70), then iterate over them using each (lets call it theBinID) as a parameter into another SELECT statement such as:
SELECT * FROM MyTable WHERE binID = theBinID ORDER BY createdDate DESC LIMIT 10
Basically, I am looking to get the 10 most recent rows for each binID.
I do not believe there is a way to do this with a basic SQL statement, although I would love that to be the answer, so I have written a stored procedure that creates a cursor on the SELECT DISTINCT of binIDs, and then iterates over it and populates a temp table.
My problem is, this is for an optimization, and if I fetch 100K rows I get 1.7 second average time. Executing my stored procedure to get 700 rows (10 records for 70 bins) is taking 1.4 seconds. I realize that 0.3 seconds could be perceived as considerable improvement, but I am hoping to get this sub-second for 100K rows.
Is there a better way??
The full stored proc is this:
BEGIN
DECLARE done INT DEFAULT FALSE;
DECLARE binID INT;
DECLARE cur1 CURSOR FOR SELECT DISTINCT heatmapBinID from MEStressTest ORDER BY heatmapBinID ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
DROP TEMPORARY TABLE IF EXISTS TempResults;
CREATE TEMPORARY TABLE TempResults (
`recordID` text NOT NULL,
`queryTerm` text NOT NULL,
`recordCreated` double(11,0) NOT NULL,
`recordByID` text NOT NULL,
`recordByName` text NOT NULL,
`recordText` text NOT NULL,
`recordSource` text NOT NULL,
`rerecordCount` int(11) NOT NULL DEFAULT '0',
`timecodeOffset` int(11) NOT NULL DEFAULT '-1',
`recordByImageURL` text NOT NULL,
`canDelete` int(11) NOT NULL DEFAULT '1',
`heatmapBinID` int(11) DEFAULT NULL,
`timelineBinID` int(11) DEFAULT NULL,
PRIMARY KEY (`recordID`(20))
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO binID;
IF done THEN
LEAVE read_loop;
END IF;
INSERT INTO TempResults (recordID, queryTerm, recordCreated, recordByID, recordByName, recordText, recordSource, rerecordCount, timecodeOffset, recordByImageURL, canDelete, heatmapBinID, timelineBinID)
SELECT * FROM MEStressTest WHERE heatmapBinID = binID ORDER BY recordCreated DESC LIMIT numRecordsPerBin;
END LOOP;
CLOSE cur1;
SELECT * FROM TempResults ORDER BY heatmapBinID ASC, recordCreated DESC;
END
Upvotes: 0
Views: 334
Reputation: 39423
Try to simulate ROW_NUMBER OVER PARTITION in MySQL: http://www.sqlfiddle.com/#!2/fd8b5/4
Given this data:
create table sentai(
band varchar(50),
member_name varchar(50),
member_year int not null
);
insert into sentai(band, member_name, member_year) values
('BEATLES','JOHN',1960),
('BEATLES','PAUL',1961),
('BEATLES','GEORGE',1962),
('BEATLES','RINGO',1963),
('VOLTES V','STEVE',1970),
('VOLTES V','MARK',1971),
('VOLTES V','BIG BERT',1972),
('VOLTES V','LITTLE JOHN',1973),
('VOLTES V','JAMIE',1964),
('ERASERHEADS','ELY',1990),
('ERASERHEADS','RAYMUND',1991),
('ERASERHEADS','BUDDY',1992),
('ERASERHEADS','MARCUS',1993);
Object, find all three most recent members on each band.
First we have to put a row_number on each member based on most year(via descending order)
select *,
@rn := @rn + 1 as rn
from (sentai s, (select @rn := 0) as vars)
order by s.band, s.member_year desc;
Output:
| BAND | MEMBER_NAME | MEMBER_YEAR | @RN := 0 | RN |
|-------------|-------------|-------------|----------|----|
| BEATLES | RINGO | 1963 | 0 | 1 |
| BEATLES | GEORGE | 1962 | 0 | 2 |
| BEATLES | PAUL | 1961 | 0 | 3 |
| BEATLES | JOHN | 1960 | 0 | 4 |
| ERASERHEADS | MARCUS | 1993 | 0 | 5 |
| ERASERHEADS | BUDDY | 1992 | 0 | 6 |
| ERASERHEADS | RAYMUND | 1991 | 0 | 7 |
| ERASERHEADS | ELY | 1990 | 0 | 8 |
| VOLTES V | LITTLE JOHN | 1973 | 0 | 9 |
| VOLTES V | BIG BERT | 1972 | 0 | 10 |
| VOLTES V | MARK | 1971 | 0 | 11 |
| VOLTES V | STEVE | 1970 | 0 | 12 |
| VOLTES V | JAMIE | 1964 | 0 | 13 |
Then we reset the row number when a member is on different band:
select *,
@rn := IF(@pg = s.band, @rn + 1, 1) as rn,
@pg := s.band
from (sentai s, (select @pg := null, @rn := 0) as vars)
order by s.band, s.member_year desc;
Output:
| BAND | MEMBER_NAME | MEMBER_YEAR | @PG := NULL | @RN := 0 | RN | @PG := S.BAND |
|-------------|-------------|-------------|-------------|----------|----|---------------|
| BEATLES | RINGO | 1963 | (null) | 0 | 1 | BEATLES |
| BEATLES | GEORGE | 1962 | (null) | 0 | 2 | BEATLES |
| BEATLES | PAUL | 1961 | (null) | 0 | 3 | BEATLES |
| BEATLES | JOHN | 1960 | (null) | 0 | 4 | BEATLES |
| ERASERHEADS | MARCUS | 1993 | (null) | 0 | 1 | ERASERHEADS |
| ERASERHEADS | BUDDY | 1992 | (null) | 0 | 2 | ERASERHEADS |
| ERASERHEADS | RAYMUND | 1991 | (null) | 0 | 3 | ERASERHEADS |
| ERASERHEADS | ELY | 1990 | (null) | 0 | 4 | ERASERHEADS |
| VOLTES V | LITTLE JOHN | 1973 | (null) | 0 | 1 | VOLTES V |
| VOLTES V | BIG BERT | 1972 | (null) | 0 | 2 | VOLTES V |
| VOLTES V | MARK | 1971 | (null) | 0 | 3 | VOLTES V |
| VOLTES V | STEVE | 1970 | (null) | 0 | 4 | VOLTES V |
| VOLTES V | JAMIE | 1964 | (null) | 0 | 5 | VOLTES V |
Then we select only the most recent three members on each band:
select x.band, x.member_name, x.member_year
from
(
select *,
@rn := IF(@pg = s.band, @rn + 1, 1) as rn,
@pg := s.band
from (sentai s, (select @pg := null, @rn := 0) as vars)
order by s.band, s.member_year desc
) as x
where x.rn <= 3
order by x.band, x.member_year desc;
Output:
| BAND | MEMBER_NAME | MEMBER_YEAR |
|-------------|-------------|-------------|
| BEATLES | RINGO | 1963 |
| BEATLES | GEORGE | 1962 |
| BEATLES | PAUL | 1961 |
| ERASERHEADS | MARCUS | 1993 |
| ERASERHEADS | BUDDY | 1992 |
| ERASERHEADS | RAYMUND | 1991 |
| VOLTES V | LITTLE JOHN | 1973 |
| VOLTES V | BIG BERT | 1972 |
| VOLTES V | MARK | 1971 |
While windowing function(e.g. ROW_NUMBER OVER PARTITION) is not yet available on MySQL, just simulate it with variables. Please let us know if this is faster than cursor approach
How it looks like on windowing-capable RDBMS: http://www.sqlfiddle.com/#!1/fd8b5/6
with member_recentness as
(
select row_number() over each_band as recent, *
from sentai
window each_band as (partition by band order by member_year desc)
)
select *
from member_recentness
where recent <= 3;
Output:
| RECENT | BAND | MEMBER_NAME | MEMBER_YEAR |
|--------|-------------|-------------|-------------|
| 1 | BEATLES | RINGO | 1963 |
| 2 | BEATLES | GEORGE | 1962 |
| 3 | BEATLES | PAUL | 1961 |
| 1 | ERASERHEADS | MARCUS | 1993 |
| 2 | ERASERHEADS | BUDDY | 1992 |
| 3 | ERASERHEADS | RAYMUND | 1991 |
| 1 | VOLTES V | LITTLE JOHN | 1973 |
| 2 | VOLTES V | BIG BERT | 1972 |
| 3 | VOLTES V | MARK | 1971 |
Upvotes: 1
Reputation: 3479
If you try to inner join 2 tables without any joining key, it will be a cartesian product of the 2 tables, i.e.:
SELECT *
FROM MyTable t
INNER JOIN (SELECT DISTINCT binId FROM MyTable) AS u
WHERE
t.binID = theBinID
ORDER BY t.createdDate DESC LIMIT 10
You may refer to this
Upvotes: 0
Reputation: 3308
SELECT * FROM MyTable WHERE binID IN (SELECT DISTINCT(bin_id) FROM mysql_table) ORDER BY createdDate DESC LIMIT 10;
This is not tested, nevermind the syntax.
Add indexes to increase performance.
Upvotes: 0