how do i count the number of data in a column for each month using a start and end date

Question

I have a table with similar information and i want to extract the data like this when a user selects a start date '2015-01-22' and end date '2015-07-31' . The result should look like this.

Month        Total Quantity    
January: 8
February: 6
March: 0
April: 0
May: 2
June: 18
July: 6

Here's a sample query and fiddle

CREATE TABLE orders
(
  id INT PRIMARY KEY AUTO_INCREMENT,
  order_date DATE,
  product_id INT,
  quantity INT,
  customer_id INT
);
INSERT INTO orders (order_date, product_id, quantity, customer_id)
  VALUES
  ('2015-01-01', 1, 2, 123),
  ('2015-01-06', 3, 6, 123),
  ('2015-02-14', 2, 4, 123),
  ('2015-02-15', 2, 2, 123),
  ('2015-05-16', 1, 1, 456),
  ('2015-05-17', 1, 1, 456),
  ('2015-06-18', 1, 5, 789),
  ('2015-06-18', 3, 7, 123),
  ('2015-06-10', 3, 6, 123),
  ('2015-07-13', 1, 5, 456),
  ('2015-07-14', 1, 1, 456);

http://sqlfiddle.com/#!2/01ac19/1

The results should be the total number of quantity of Orders per month

Clockwork-Muse · Accepted Answer

First, what you ~~want~~ NEED is known as a "Calendar table". They are, hands down, the most useful Analysis table you can make. Their individual definition and data fill varies, and won't be covered here, but for our purposes, we'll use the following minimum definition:

CREATE TABLE Calendar (calendarDate DATE PRIMARY KEY,
                       year INTEGER,
                       month INTEGER
                       dayOfMonth INTEGER);

... and it's filled with data like you expect (insert every single date from when your business started, to a reasonable point in the future). You also want indices on this - lots of indices.

Next, you need to consider something important about databases: they can't use indices if function output is used as criteria. Basically, if it's not in the SELECT clause, using a function (even via some implicit casts) makes your query slower. So, doing things like YEAR(order_date) should be avoided.
How then do we aggregate by things like year or month? Via ranges queries. If the database has an index, it's pretty cheap for it to look up the start and end of a range (and nicely parallelizable, too). In our case, the range is >= startOfMonth to < startOfNextMonth. We can now build an in-process range table:

SELECT year, month, 
       calendarDate AS monthStart, 
       calendarDate + INTERVAL 1 MONTH AS nextMonthStart
FROM Calendar
WHERE dayOfMonth = 1
      AND calendarDate >= :queryStartRange
      AND calendarDate < :queryEndRange

... where the : is denoting start-of-month values, which is left as an exercise for the reader.

Now, remember how I said "no functions"? calendarDate + INTERVAL 1 MONTH actually counts. However, it's not going to matter here; the resulting table is so small (just 12 rows per year!) that good RDBMSs can place the contents in memory for faster results (because it would take longer to hit just the index).

Now that we have our range-query table, we can join it to the Orders ("fact") table;

SELECT DRange.year, DRange.month, SUM(Orders.quantity) AS total_quantity
FROM (SELECT year, month, 
             calendarDate AS monthStart, 
             calendarDate + INTERVAL 1 MONTH AS nextMonthStart
      FROM Calendar
      WHERE dayOfMonth = 1
            AND calendarDate >= :queryStartRange
            AND calendarDate < :queryEndRange) AS DRange
JOIN Orders
  ON Orders.order_date >= DRange.monthStart
     AND Orders.order_date < DRange.nextMonthStart
GROUP BY DRange.year, DRange.month
ORDER BY DRange.year, DRange.month

Example Fiddle
(fun trick: using LEFT JOIN instead of JOIN will net you null-quantity rows if a month has no orders - like for March and April in your example data)

So what does this get us? Range-query access at the base data, which will make for a faster query. If, for some reason, order_date gets turned into a timestamp, the query is completely safe - we'll correctly get all orders, and put them in the proper months.

how do i count the number of data in a column for each month using a start and end date

Answers (2)

Related Questions