Reputation: 161
So I got a request for some data from someone in my company. This data request requires SQL query of multiple columns from multiple tables BUT can only be the most recent incident from a column from one of those tables. Here is the kicker...each table has a single column that it shares with another table. So to get this "report" I have to do it piece by piece by piece.
Here is how my initial query looked before I realized I only need the most recent update to one of the tables:
SELECT a.description AS "Description",
a.pricing AS "Price",
b.id AS "ID",
c.descriptionb AS "DescriptionB",
c.date AS "date",
d.descriptionc AS "DescriptionC"
FROM database.table1 a,
database.table2 b,
database.table3 c,
database.table4 d
WHERE a.description = b.descriptive_info
AND b.id = c.comp_id
AND c.descriptionb = d.long_description
AND d.id_for_a = a.id
AND a.company IN ( '000', '001', '002', '003', '004' )
AND b.expdate >= Now()
I realized that the "c.date" above needs to only display the most recent date for each unique ID/DescriptionC.
Here is an example result of the initial query:
Description|Price |ID |DescriptionB |date |DescriptionC
---------------------------------------------------------------------
Computer |300 |554 |5% Off |3/2/2010 |Includes CPU
Computer |300 |554 |5% Off |3/2/2010 |Includes DOG
Computer |300 |554 |5% Off |3/2/2010 |Includes CAT
Computer |300 |554 |9% Off |4/3/2011 |Includes CPU
Computer |300 |554 |9% Off |4/3/2011 |Includes DOG
Computer |300 |554 |9% Off |4/3/2011 |Includes CAT
Computer |300 |554 |7% Off |9/1/2019 |Includes CPU
Computer |300 |554 |7% Off |9/1/2019 |Includes DOG
Computer |300 |554 |7% Off |9/1/2019 |Includes CAT
Printer |75 |801 |3% Off |6/3/2012 |Includes DOS
Printer |75 |801 |3% Off |6/3/2012 |Includes PIG
Printer |75 |801 |3% Off |6/3/2012 |Includes RAT
Printer |75 |801 |9% Off |8/3/2013 |Includes DOS
Printer |75 |801 |9% Off |8/3/2013 |Includes PIG
Printer |75 |801 |9% Off |8/3/2013 |Includes RAT
Printer |75 |801 |1% Off |1/3/2019 |Includes DOS
Printer |75 |801 |1% Off |1/3/2019 |Includes PIG
Printer |75 |801 |1% Off |1/3/2019 |Includes RAT
Here is the result of the query by Laurenz below:
Description|Price |ID |DescriptionB |date |DescriptionC
---------------------------------------------------------------------
Computer |300 |554 |7% Off |9/1/2019 |Includes CAT
...close but not quite there yet.
Desired Result:
Description|Price |ID |DescriptionB |date |DescriptionC
---------------------------------------------------------------------
Computer |300 |554 |7% Off |9/1/2019 |Includes CPU
Computer |300 |554 |7% Off |9/1/2019 |Includes DOG
Computer |300 |554 |7% Off |9/1/2019 |Includes CAT
Printer |75 |801 |1% Off |1/3/2019 |Includes DOS
Printer |75 |801 |1% Off |1/3/2019 |Includes PIG
Printer |75 |801 |1% Off |1/3/2019 |Includes RAT
As you can see there are multiple dates for the "same product", duplicate products with varying descriptions, etc. I basically only want the rows with the most recent "date" for each unique ID/DescriptionC. Hopefully this is a little easier to understand than my original post.
BTW these are simplified examples as I don't want to get in trouble by my company but the query and concepts are the same. If you can imagine multiple products with numerous instances of each product multiple times you can imagine how big the data set could get. I only care about the most recent instance of each unique ID/DescriptionC.
Upvotes: 1
Views: 155
Reputation: 161
Thank you so much everyone. The answer from Hal McGee was basically correct! I just had to tweak a few things. Here is an example of the full query I ended up using that functioned as I was needing it to:
SELECT *
FROM (
SELECT
a.description AS "Description",
a.pricing AS "Price",
b.string AS "String",
c.description AS "Description",
c.date AS "date",
d.descriptionb AS "DescriptionB",
ROW_NUMBER() OVER (PARTITION BY d.descriptionc, b.id ORDER BY c.date DESC ) AS rn
FROM
database.table1 a
INNER JOIN database.table2 b ON a.id = b.table1_id
INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
INNER JOIN database.table4 d ON c.value = d.table3_value
WHERE
a.company IN ( '000', '001', '002', '003', '004' )
) x WHERE rn = 1;
Upvotes: 1
Reputation: 21
The following sql can be used to solve the question:
SELECT *
FROM (
SELECT
a.description AS "Description",
a.pricing AS "Price",
b.string AS "String",
c.description AS "Description",
c.date AS "date",
d.descriptionb AS "DescriptionB",
ROW_NUMBER() OVER (ORDER BY c.date DESC PARTITION BY B.ID ) AS rn
FROM
database.table1 a
INNER JOIN database.table2 b ON a.id = b.table1_id
INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
INNER JOIN database.table4 d ON c.value = d.table3_value
WHERE
a.company IN ( '000', '001', '002', '003', '004' )
) x WHERE rn = 1;
Upvotes: 2
Reputation: 222572
I understand that, out of the results from the current query, you just want to select the one that has the maximum value on c.date
.
One solution would be to turn the existing query to a subquery, and use ROW_NUMBER()
to rank the records by descending c.date
. Then, the outer query can just filter on the highest ranked record.
Query:
SELECT *
FROM (
SELECT
a.description AS "Description",
a.pricing AS "Price",
b.string AS "String",
c.description AS "Description",
c.date AS "date",
d.descriptionb AS "DescriptionB",
ROW_NUMBER() OVER (ORDER BY c.date DESC) AS rn
FROM
database.table1 a
INNER JOIN database.table2 b ON a.id = b.table1_id
INNER JOIN database.table3 c ON b.element = c.table2_element AND b.expdate >= Now()
INNER JOIN database.table4 d ON c.value = d.table3_value
WHERE
a.company IN ( '000', '001', '002', '003', '004' )
) x WHERE rn = 1;
PS:
always prefer explicit joins instead of old-school, implicit joins; I changed the initial query accordingly
if you need the maximum date by partition (which is not obvious to tell without seeing sample data), then you simply need to add a PARTITION BY
clause to the ROW_NUMBER()
function.
Upvotes: 4