Francisco Zarabozo
Francisco Zarabozo

Reputation: 3751

How to GROUP BY consecutive data (date in this case)

I have a products table and a sales table that keeps record of how many items a given product sold during each date. Of course, not all products have sales everyday.

I need to generate a report that tells me how many consecutive days a product has had sales (from the latest date to the past) and how many items it sold during those days only.

I'd like to tell you how many things I've tried so far, but the only succesful (and slow, recursive) ones are solutions inside my application and not inside SQL, which is what I want.

I also have browsed several similar questions on SO but I haven't found one that lets me have a clear idea of what I really need.

I've setup a SQLFiddle here to show you what I'm talking about. There you will see the only query I can think of, which doesn't give me the result I need. I also added comments there showing what the result of the query should be.

I hope someone here knows how to accomplish that. Thanks in advance for any comments!

Francisco

Upvotes: 4

Views: 2328

Answers (3)

Cyril Gandon
Cyril Gandon

Reputation: 17048

A solution in pure SQL without store procedure : Fiddle

SELECT sku
     , COUNT(1) AS consecutive_days
     , SUM(items) AS items
FROM
(
  SELECT sku
       , items
       -- generate a new guid for each group of consecutive date
       -- ie : starting with day_before is null
       , @guid := IF(@sku = sku and day_before IS NULL, UUID(), @guid) AS uuid
       , @sku := sku AS dummy_sku
  FROM 
  (
    SELECT currents.sku
         , befores.record_date as day_before
         , currents.items
    FROM sales currents
      LEFT JOIN sales befores 
        ON currents.sku = befores.sku 
        AND currents.record_date = befores.record_date + INTERVAL 1 DAY
    ORDER BY currents.sku, currents.record_date
  )  AS main_join
    CROSS JOIN (SELECT @sku:=0) foo_sku
    CROSS JOIN (SELECT @guid:=UUID()) foo_guid
) AS result_to_group
GROUP BY uuid, sku

The query is really not that hard. Declare variables via cross join (SELECT @type:=0) type. Then in the selects, you can set variables value row by row. It is necessary for simulating Rank function.

Upvotes: 3

Puneet Aggarwsal
Puneet Aggarwsal

Reputation: 1

select
  p.*,
  sum(s.items) sales,
  count(s.record_date) days_sold
from
  products p
join
  sales s
  on
  s.sku = p.sku
where record_date between '2013-04-18 00:00:00' and '2013-04-26 00:00:00'
group by sku; 

Upvotes: -1

Benos
Benos

Reputation: 696

http://sqlfiddle.com/#!2/20108/1

Here is a store procedure that do the job

CREATE PROCEDURE myProc()
BEGIN
    -- Drop and create the temp table
    DROP TABLE IF EXISTS reached;
    CREATE TABLE reached (
    sku CHAR(32) PRIMARY KEY,
    record_date date,
    nb int,
    total int)
   ENGINE=HEAP;

-- Initial insert, the starting point is the MAX sales record_date of each product
INSERT INTO reached 
SELECT products.sku, max(sales.record_date), 0, 0
FROM products
join sales on sales.sku = products.sku
group by products.sku;

-- loop until there is no more updated rows
iterloop: LOOP
    -- Update the temptable with the values of the date - 1 row if found
    update reached
    join sales on sales.sku=reached.sku and sales.record_date=reached.record_date
    set reached.record_date = reached.record_date - INTERVAL 1 day, 
        reached.nb=reached.nb+1, 
        reached.total=reached.total + sales.items;

    -- If no more rows are updated it means we hit the most longest days_sold
    IF ROW_COUNT() = 0 THEN
        LEAVE iterloop;
    END IF;
END LOOP iterloop;

-- select the results of the temp table
SELECT products.sku, products.title, products.price, reached.total as sales, reached.nb as days_sold 
from reached
join products on products.sku=reached.sku;

END//

Then you just have to do

call myProc()

Upvotes: 3

Related Questions