Reputation: 107
Each product has a LatLng
. Over time, sometimes products can be duplicated but it is difficult to spot. The price or images may be slightly different.
I want to compare Products
by their ProductType
and their Price
in each location and put them on a map so duplicates can be found more easily.
For each product I calculate 5% of its price so I can then add/subtract this from other Product prices to find rough matches.
Products
ProductID | ProductType | Price | Latitude | Longitude |
---|---|---|---|---|
ABC | Red Widget | 500 | 12.34 | 67.89 |
DEF | Red Widget | 505 | 12.34 | 67.89 |
MNO | Red Widget | 480 | 12.34 | 67.89 |
RST | Red Widget | 500 | 12.34 | 67.89 |
UVW | Red Widget | 300 | 12.34 | 67.89 |
JKL | Blue Widget | 800 | 76.54 | 32.10 |
XYZ | Blue Widget | 800 | 45.67 | 23.45 |
The expected result is that ABC
, DEF
, MNO
AND RST
should be returned because they are all Red Widgets
, are in the same location and are have either exactly the same price or prices within 5% of each other's price.
UVW
is outside the percentage so likely not a duplicate and should not be returned.
JKL
and XYZ
are Blue Widgets
with the same price, but not in the same location, and thus not returned.
SQL
This is what I have so far but it is returning far too many results. I think it's returning rows that are matching outside of each location. The ANDs in the JOIN look correct to me. It feels like the ORs may be allowing more flexibility in matching but the enclosing brackets look correct...
WITH cte AS
(
SELECT
p.ProductID,
p.ProductType,
p.Price,
(p.Price / 100) * 5 AS PricePercent,
ROUND(p.Latitude, 3) AS Latitude,
ROUND(p.Longitude, 3) AS Longitude
FROM
Products p
WHERE
p.Latitude IS NOT NULL AND p.Longitude IS NOT NULL
)
SELECT
DISTINCT a.ProductID,
a.Price,
a.Latitude, a.Longitude
FROM
cte a
INNER JOIN
/* ProductIDs don't match */
cte b ON a.ProductID <> b.ProductID
/* match only where location is the same*/
AND a.Latitude = b.Latitude
AND a.Longitude = b.Longitude
/* match only where Product Type is the same*/
AND a.ProductType = b.ProductType
/*match only if price is the same, or within 5% above or 5% below price*/
AND (/*same price*/
b.Price = a.Price
OR
/*b.price is within percentage over a.price*/
(b.Price > a.Price AND b.Price < (a.Price+a.PricePercent))
OR
/*b.price is within percentage under a.price */
(b.Price < a.Price AND b.Price >(a.Price-a.PricePercent)))
Upvotes: 0
Views: 269
Reputation: 5894
First this thing:
AND (/*same price*/
b.Price = a.Price
OR
/*b.price is within percentage over a.price*/
(b.Price > a.Price AND b.Price < (a.Price+a.PricePercent))
OR
/*b.price is within percentage under a.price */
(b.Price < a.Price AND b.Price >(a.Price-a.PricePercent))
)
Is (or should be) the same as this:
AND (
-- price within range +/- percent
b.Price < (a.Price+a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
And second for me it works:
MS SQL Server 2017 Schema Setup:
Query 1:
WITH cte AS
(
SELECT
p.ProductID,
p.ProductType,
p.Price,
(p.Price / 100) * 5 AS PricePercent,
ROUND(p.Latitude, 3) AS Latitude,
ROUND(p.Longitude, 3) AS Longitude
FROM
Products p
WHERE
p.Latitude IS NOT NULL AND p.Longitude IS NOT NULL
)
SELECT
DISTINCT a.ProductID,
a.Price,
a.Latitude, a.Longitude
FROM
cte a
INNER JOIN
/* ProductIDs don't match */
cte b ON a.ProductID <> b.ProductID
/* match only where location is the same*/
AND a.Latitude = b.Latitude
AND a.Longitude = b.Longitude
/* match only where Product Type is the same*/
AND a.ProductType = b.ProductType
/*match only if price is the same, or within 5% above or 5% below price*/
AND (
-- price within range +/- percent
b.Price < (a.Price+a.PricePercent)
AND b.Price >(a.Price-a.PricePercent)
)
ProductID | Price | Latitude | Longitude |
---|---|---|---|
ABC | 500 | 12 | 68 |
DEF | 505 | 12 | 68 |
RST | 500 | 12 | 68 |
Upvotes: 3