Reputation: 153
I’m trying to pull information from a DB that has data pumped into it by another source. The way things are, every time this source updates information about a Product, instead of doing an Update…it just does an Insert. So you have a lot of duplicate information in there and so queries like:
SELECT
product_data.ProductUPC,
product_data.NAME,
product_data.Shelf,
product_data.Create_Time,
--Supplier--
Supplier = CASE supplier_data.Supplier_Code
WHEN '01' THEN 'BlueBerries'
WHEN '02' THEN 'Grey'
WHEN '03' THEN 'Coco'
ELSE 'OTHER'
END
--JOINS--
FROM
product_data
INNER JOIN
supplier_data ON supplier_data.ProductID = product_data.ProductID
WHERE
product_data.ProductUPC IS NOT NULL
Returns something like:
ProductUPC Product Name Shelf Supplier Create_Time
12345 Cookie Cutter NULL NULL 2015-01-06 16:11:00.667
12345 Cookie Cutter NULL BlueBerries 2015-01-22 16:11:00.667
12345 Cookie Cutter 12 BlueBerries 2015-03-06 16:11:00.667
00040 NULL NULL NULL 2015-01-06 16:11:00.667
00040 Water Hose NULL Grey 2015-01-22 16:11:00.667
00040 Water Hose 11 Grey 2015-03-06 16:11:00.667
I only want to pull the row with the most information for each productUPC, which is usually the row with the latest "Create_Time", a column located in all tables in the DB. So I want a return like this:
ProductUPC Product Name Shelf Supplier Create_Time
12345 Cookie Cutter 12 BlueBerries 2015-03-06 16:11:00.667
00040 Water Hose 11 Grey 2015-03-06 16:11:00.667
So I've been trying MAX(Create_Time)
SELECT
product_data.ProductUPC,
product_data.NAME,
product_data.Shelf,
product_data.Create_Time,
--Supplier--
Supplier = CASE supplier_data.Supplier_Code WHEN
WHEN '01' THEN 'BlueBerries'
WHEN '02' THEN 'Grey'
WHEN '03' THEN 'Coco'
ELSE 'OTHER' END
--JOINS--
FROM product_data
INNER JOIN
supplier_data ON supplier_data.ProductID = product_data.ProductID
WHERE
product_data.ProductUPC IS NOT NULL
AND product_data.create_time = (
SELECT MAX(product_data.create_time)
FROM product_data
)
But it only returns the headers. (No information).
Trying the MAX within the Select:
SELECT MAX(product_data.Create_Time)
product_data.ProductUPC,
product_data.NAME,
product_data.Shelf
--Supplier--
Supplier = CASE supplier_data.Supplier_Code WHEN
WHEN '01' THEN 'BlueBerries'
WHEN '02' THEN 'Grey'
WHEN '03' THEN 'Coco'
ELSE 'OTHER' END
--JOINS--
FROM product_data
INNER JOIN
supplier_data ON supplier_data.ProductID = product_data.ProductID
WHERE
product_data.ProductUPC IS NOT NULL
Returns error stating my other columns are "is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause." So I know I'm on the wrong track here.
Any tips where I'm going wrong? Thanks.
Upvotes: 1
Views: 38
Reputation: 1952
You were on the right track with your GROUP BY
. Try adding something like this below your supplier_data
JOIN
:
JOIN
(SELECT
ProductUPC, MAX(Create_Time) Max_Create_Time
FROM product_data
GROUP BY ProductUPC) A ON product_data.ProductUPC = A.ProductUPC
AND product_data.Create_time = A.Max_Create_Time
EDIT: Also, you mention that this type of data is "duplicate data". That's not necessarily the case. This is actually not a terrible way of tracking changes. With this system, you can track every change made to a product over time. There are certainly other ways to track changes, but this certainly works. You just have to be aware of it when you're writing queries and developing reports.
Upvotes: 2