Reputation: 703
My small grocery store has partnered with a third party to take online orders. I made an Excel sheet that queries our product database for a bunch of product information, and I feed that data into another macro-enabled worksheet that does the heavy lifting and generates a .CSV that I upload each week.
What I'm trying to accomplish now is to pull in current and future sales information so we can advertise them. Sales are created in our POS by putting them into groupings called "worksheets." The complicating factor is that a product can exist on multiple worksheets at once, and there's a logic tree that is followed to determine which of the worksheets will affect the scan price at any given time.
The two attributes that determine precedence are "priority" and "date committed".
So, given data that looks like this:
item_id worksheet_name priority date_committed sale_start_date sale_end_date sale_price
011259904209 A 2 2016-06-22 09:21:09.041 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 B 2 2016-06-22 09:49:31.722 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 2.0000
074682105322 C 2 2016-06-22 08:57:04.641 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 2.0000
042563013660 A 2 2016-06-22 09:21:09.048 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.9900
042563013660 D 1 2016-06-25 14:03:33.499 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
042563013660 E 2 2016-06-22 08:49:13.515 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.9900
073360772054 A 2 2016-06-22 09:21:09.114 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.9900
073360772054 B 2 2016-06-22 09:49:31.831 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 3.9900
073360772054 E 2 2016-06-22 08:49:13.520 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.9900
073360772054 C 2 2016-06-22 08:57:04.649 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 3.9900
012993221010 A 2 2016-06-22 09:21:09.110 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 3.3900
012993221010 B 2 2016-06-22 09:49:31.828 2016-07-20 00:00:00.000 2016-08-03 11:00:00.000 3.3900
012993221010 D 1 2016-06-25 14:03:33.502 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
012993221010 E 2 2016-06-22 08:49:13.517 2016-06-28 00:00:00.000 2016-07-20 16:00:00.000 3.3900
012993221010 C 2 2016-06-22 08:57:04.646 2016-07-19 00:00:00.000 2016-08-03 16:00:00.000 3.3900
I want to get this:
Run on 6/27
item_id worksheet_name sale_start_date sale_end_date sale_price
011259904209 A 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 C 2016-07-19 00:00:00.000 2016-07-20 00:00:00.000 2.0000
042563013660 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
073360772054 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
012993221010 E 2016-06-28 00:00:00.000 2016-06-29 00:00:00.000 3.9900
Run on 6/29
item_id worksheet_name sale_start_date sale_end_date sale_price
011259904209 A 2016-06-29 00:00:00.000 2016-07-20 11:00:00.000 2.0000
074682105322 C 2016-07-19 00:00:00.000 2016-07-20 00:00:00.000 2.0000
042563013660 D 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
073360772054 A 2016-06-29 00:00:00.000 2016-07-19 00:00:00.000 3.9900
012993221010 D 2016-06-29 00:00:00.000 2016-07-05 23:59:59.000 2.9900
Bonus points for combining overlapping sale periods to reflect the shopper's perception, but that's not necessary.
How can I get this result using SQL? Our sales run Wednesday through Tuesday, and I'm ideally generating my data file for the coming week on Wednesday afternoon or Thursday morning after we finish our weekly price changes.
We have tens of thousands of products in file.
Here's a graphical representation of the worksheet priorities per day
This is for SQL Anywhere 10, and running SELECT @@VERSION
tells me 12.0.1.3967
Upvotes: 0
Views: 51
Reputation: 32220
I'm mostly familiar with SQL Server, but Sybase's SQL Anywhere is still very similar.
This is the "greatest N per group" problem. The typical solution (in anything other than MySQL) is to use ROW_NUMBER(), which is available on SQL Anywhere 10 and later as far as I can tell from Sybase books online.
SELECT a.item_id,
a.worksheet_name,
a.sale_start_date,
a.sale_end_date,
a.sale_price
FROM (
SELECT item_id,
worksheet_name,
sale_start_date,
sale_end_date,
sale_price,
ROW_NUMBER() OVER (PARTITION BY item_id ORDER BY priority, date_committed DESC) AS rn
FROM UnnamedSalesTable
WHERE sale_start_date <= CURRENT DATE
AND sale_end_date > CURRENT DATE) a
WHERE a.rn = 1
Obviously, you can replace CURRENT DATE
with whatever date value you want to run.
If you happen to have two sales with an identical priority
and date_committed
, then you'll still only get one row. In other words, if there are duplicates you won't know. I suspect that's what you want. However, if you need to see duplicates, then instead of ROW_NUMBER() you'd want to use RANK() or DENSE_RANK() (either work in this case). That will allow "ties" to show up. Otherwise the query would be identical. If this duplication happens a lot, then you'll want to add a third column to the ORDER BY portion of the WINDOW clause.
Upvotes: 1