Reputation: 8584
I'm trying to take transactional data and cleanse it to meet my analysis needs. There are some limitations to how transactions are recorded into the database, and I am trying to get around those limitations.
When a customer places an order with more than 1 product, the transactional database doesn't link the multiple products together. Each product will have a unique sales ID, but there is no way to group multiple sales ID into 1 order. Here is a sample:
OrderID MultOrderID CustomerID SalesDate SalesTime ProductID ProductCost ShippingCost
6082346 7661X0A 2012-06-12 959 105 99.99 7.95
6082347 5809812YY6Y T891002 2012-06-12 1005 222 99.95 7.95
6082348 5809812YY6Z T891002 2012-06-12 1005 273 22.95 1.00
6082349 5809812YY71 T891002 2012-06-12 1005 285 499.95 1.00
6082350 5809812YY72 T891002 2012-06-12 1005 172 49.95 1.00
6082351 5809812YY73 T891002 2012-06-12 1005 105 99.99 7.95
6082352 5809812YY74 X637251 2012-06-12 1010 285 499.95 7.95
6082353 5809812YY75 X637251 2012-06-12 1010 30 1024.99 1.00
6082354 T512AT0 2012-06-12 1017 172 49.95 7.95
An additional limitation to this transaction system is that it can not ship more than 4 products together. If the customer places an order for 5 products, 4 products are shipped together (and charged 1 shipping charge), the remaining product is shipped separately and charged another shipping charge (yes, the overall business wants to rebuild this entire legacy system....).
What I am trying to determine is the number of products shipped per order, and the aggregate product costs and shipping costs.
If you look at the last 4 characters of the MultOrderID, you'll see that it's sequential, YY6Y becomes YY6Z, then rolls over to YY71, YY72. The logic is standardized - I know that if the CustomerID, SalesDate and SalesTime are the same, then I can pair off the products together. What I don't know is HOW I can accomplish this.
I believe the way to accomplish this is to break out the orders by CustomerID, SalesDate and SalesTime. Then, I get a for-loop
, or something like that to cycle through the individual entries. Then, I look for the last 4 characters of the MultOrderID and say - If 1,2 and 3 are the same, and the 4th character is after the 4th character of the previous order, then pair it together, up to 4 orders. If the orderID is the 5th to 8th order in the range, then that's shipment 2, etc.
Can this be done in SQL Server
? If not in that, what should I write this in? And is a for-loop
what I should be using in this case?
Edit: Here is the output I am trying to get to. Keep in mind that after the 4th product was shipped, I need to restart the ordering (so, 6 products get broken into 2 shipments [4 products and 2 products], 9 products into 3 shipments [4, 4, and 1].
PRODUCTSSHIPPED SALESDATE SALESTIME CUSTOMERID PRODUCTCOST SHIPPINGCOST
4 6/12/12 1005 T891002 672.8 10.95
1 6/12/12 1005 T891002 99.99 7.95
2 6/12/12 1010 X637251 1524.94 8.95
1 6/12/12 1017 T512AT0 49.95 7.95
1 6/12/12 959 7661X0A 99.99 7.95
Upvotes: 0
Views: 1855
Reputation: 14012
Well from this statement it seems like you want this:
What I am trying to determine is the number of products shipped per order, and the aggregate product costs and shipping costs.
http://sqlfiddle.com/#!3/e0e71/30
So I'm not sure what you mean by using a foreach loop?
UPDATE:
Got it working using a subquery and ceiling function
Updated the fiddle
FYI SQL is:
SELECT
SalesDate,
SalesTime,
CustomerID,
SUM(ProductCost),
SUM(ShippingCost)
FROM
(
SELECT
SalesDate,
SalesTime,
CustomerID,
ProductCost,
ShippingCost,
ROW_NUMBER() OVER (PARTITION BY salesdate, salestime, customerid ORDER BY CustomerID) as ProdNumber
FROM Orders
) as Summary
group by SalesDate, SalesTime, CustomerID, ceiling(ProdNumber / 4.0)
I used ROW_NUMBER to get a running count of products for each order, then made this a subquery so I could do the grouping. The grouping just used the number of products divided by 4 (as float) and uses the ceiling function to round up to the nearest int to get it grouping into 4
Upvotes: 2
Reputation: 35323
I'm not sure why a loop is needed at all..
Select count(*) as ProductsOnOrder, LEFT(CustomerID,4), as CID,
SalesDate, SalesTime, sum(productCost), sum(ShippingCost)
FROM YOUR_TABLENAME
GROUP BY left(CustomerID,4), salesdate, salestime
What order number do you want to display? Min? Max? All of em? what? same question on products do you want to list the products or just count them?
Select count(*) as ProductsOnOrder, LEFT(CustomerID,4), as CID,
SalesDate, SalesTime, sum(productCost), sum(ShippingCost),
min(orderID), Max(orderID)
FROM YOUR_TABLENAME
GROUP BY left(CustomerID,4), salesdate, salestime
Since you know orderID is sequential for each line on the order you could return min/max and subtract the two as well to get a count.
Upvotes: 0
Reputation: 2123
I dont think you need a loop here. Usually its considered as a bad practice in sql, unless completly unavoidable. Can you assume that if an order is made by a user at the same datetime exactly it belongs to the same logical order (order group)? Anyway, the whole problem can probably be solved using SQL server's partition and over clauses. Look at sample D there, i think its doing something close to what you need.
EDIT
The range clause is availibale only in sql 2012, however you can still use partioning and rownumber, and then group by ur results by using simple calculation (ROWNUMBER / 4) on the returned rownumber
Upvotes: 1
Reputation: 5425
This should give you the number of orders for that customer/date/time in the NumOrders field. It uses my new favorite function, Row_Number:
SELECT [CUSTOMERID], [SALESDATE], [SALESTIME], MAX(NumOrders)
FROM (
SELECT [CUSTOMERID],
[SALESDATE],
[SALESTIME],
ROW_NUMBER() OVER(PARTITION BY [CUSTOMERID], [SALESDATE], [SALESTIME] ORDER BY [CUSTOMERID]) AS NumOrders
) t1
GROUP BY [CUSTOMERID], [SALESDATE], [SALESTIME]
Upvotes: 1