Reputation: 7669
I have a small database of various computer parts and prices. There are 2 tables, Parts and Prices.
Parts:
partID desc mfgr timeStamp
---------- ---------- ---------- ----------
1 RAM Crucial 1
2 MOBO MSI 1
3 I7 970 Intel 1
1 RAM Crucial 2
Prices:
productID qty price timeStamp
---------- ---------- ---------- ----------
1 1 50.0 1
1 2 100.0 1
1 3 130.0 1
2 1 140.0 1
3 1 499.99 1
3 1 449.99 2
1 4 150.0 2
2 1 150.0 2
1 1 40.0 2
1 4 200.0 3
I need to grab everything from Parts that has the most recent timestamp, and GROUP_CONCAT(price) for all of the prices that match both the partID and the timestamp. So the output should look something like this:
partID desc mfgr timeStamp GROUP_CONCAT(price)
---------- ---------- ---------- ---------- -------------------
1 RAM Crucial 2 150.0,40
2 MOBO MSI 1 140.0
3 I7 970 Intel 1 499.99
I'm really close, but not quite getting the right results. I have tried
SELECT * FROM Parts INNER JOIN
(SELECT partID, MAX(Parts.timeStamp) as maxTS, GROUP_CONCAT(price) FROM
Parts, Prices WHERE partID = Prices.productID GROUP BY partID) grouped
ON Parts.partID = grouped.partID AND Parts.timeStamp = grouped.maxTS;
But this grabs everything from the pricing table where the part ID matches, whether or not the timestamp also matches.
partID desc mfgr timeStamp partID maxTS GROUP_CONCAT(price)
---------- ---------- ---------- ---------- ---------- ---------- -------------------
2 MOBO MSI 1 2 1 140.0,150.0
3 I7 970 Intel 1 3 1 449.99,499.99
1 RAM Crucial 2 1 2 40.0,50.0,100.0,130
So I changed my command to
SELECT * FROM Parts INNER JOIN
(SELECT partID, MAX(Parts.timeStamp) AS maxTS, GROUP_CONCAT(price)
FROM Parts, Prices
WHERE partID = Prices.productID AND (SELECT MAX(parts.timestamp) FROM Parts) = Prices.timeStamp) grouped
ON Parts.partID = grouped.partID AND Parts.timeStamp = grouped.maxTS;
But this only matches rows from pricing that have the largest timestamp that occurs in parts. (Which is 2)
What am I doing wrong here?
Upvotes: 0
Views: 58
Reputation: 4225
The mistake you are making is that you are grouping price entries before filtering out only the entries that you need. So, your grouped
sub-query will contain all the prices for a partID
, and there is no way to separate them out later, because they are grouped.
The best approach for solving these kind of things is breaking down your queries into pieces that you need.
You say:
I need to grab everything from Parts that has the most recent timestamp
So, let's do that. Note that this will pretty much require a sub-query, or "pivot", since RDBMS are not that good in letting you pick a row on a condition that depends on other rows (in this case, selecting rows where a field is maximum among some group). We'll call this sub-query aux
, and use it to select entries from parts
that match the partID
/timeStamp
combination that satisfies the condition:
select * from parts,
(select partId, max(timeStamp) maxts from parts group by partId) aux
where parts.partId = aux.partId and parts.timeStamp = aux.maxts
This is using implicit joins, you can rewrite this query using JOIN
syntax as well. I personally avoid JOIN
syntax unless I need left, or other special joins:
select * from parts
join (select partId, max(timeStamp) maxts from parts group by partId) aux
on parts.partId = aux.partId and parts.timeStamp = aux.maxts
Now, you wanna join it with the prices of the same ID/timestamp, but group the prices together (as GROUP_CONCAT
). The key here is to only select (which happens before grouping) the price entries that match the "latest" part entries.
Because the first query produces output that can be directly joined with prices tables, the query just needs to be extended to include price table, and grouping:
select parts.partid, parts.desc, group_concat(prices.price) from
parts, prices, (
select partId, max(timeStamp) maxts from parts group by partId) aux
where
parts.partId = aux.partId and
parts.timeStamp = aux.maxts and
prices.productID = parts.partid and
prices.timestamp = parts.timestamp
group by parts.partid, parts.desc
This can be re-written using JOIN
syntax as well:
select parts.partid, parts.desc, group_concat(prices.price)
from parts
join (select partId, max(timeStamp) maxts from parts group by partId) aux
on parts.partId = aux.partId and parts.timeStamp = aux.maxts
join prices on prices.productID = parts.partid and prices.timestamp = parts.timestamp
group by parts.partid, parts.desc
This query is a little bit convoluted, and depending on the data set, it may be beneficial to re-write it in a different way, to make sure database understands (good for the optimizer) what gets filtered first. We can move the filtered "parts" into a sub-query of its own (call it bux
), and then join that with prices
table:
select bux.partid, bux.desc, group_concat(prices.price) from prices
join (
select parts.partId, parts.desc, aux.maxts
from parts join
(select partId, max(timeStamp) maxts from parts group by partId) aux
on parts.partId = aux.partId and parts.timeStamp = aux.maxts
) bux
on prices.productID = bux.partid and prices.timestamp = bux.maxts
group by bux.partid, bux.desc
If you check the execution plans between the two, you will see a difference. Selecting which one to use on production would depend on which one performs better.
http://sqlfiddle.com/#!9/f12c8/10/0
Upvotes: 2