Reputation: 858
I have a table with similar information and i want to extract the data like this when a user selects a start date '2015-01-22' and end date '2015-07-31' . The result should look like this.
Month Total Quantity
January: 8
February: 6
March: 0
April: 0
May: 2
June: 18
July: 6
Here's a sample query and fiddle
CREATE TABLE orders
(
id INT PRIMARY KEY AUTO_INCREMENT,
order_date DATE,
product_id INT,
quantity INT,
customer_id INT
);
INSERT INTO orders (order_date, product_id, quantity, customer_id)
VALUES
('2015-01-01', 1, 2, 123),
('2015-01-06', 3, 6, 123),
('2015-02-14', 2, 4, 123),
('2015-02-15', 2, 2, 123),
('2015-05-16', 1, 1, 456),
('2015-05-17', 1, 1, 456),
('2015-06-18', 1, 5, 789),
('2015-06-18', 3, 7, 123),
('2015-06-10', 3, 6, 123),
('2015-07-13', 1, 5, 456),
('2015-07-14', 1, 1, 456);
http://sqlfiddle.com/#!2/01ac19/1
The results should be the total number of quantity of Orders per month
Upvotes: 0
Views: 75
Reputation: 13056
First, what you want NEED is known as a "Calendar table". They are, hands down, the most useful Analysis table you can make. Their individual definition and data fill varies, and won't be covered here, but for our purposes, we'll use the following minimum definition:
CREATE TABLE Calendar (calendarDate DATE PRIMARY KEY,
year INTEGER,
month INTEGER
dayOfMonth INTEGER);
... and it's filled with data like you expect (insert every single date from when your business started, to a reasonable point in the future). You also want indices on this - lots of indices.
Next, you need to consider something important about databases: they can't use indices if function output is used as criteria. Basically, if it's not in the SELECT
clause, using a function (even via some implicit casts) makes your query slower. So, doing things like YEAR(order_date)
should be avoided.
How then do we aggregate by things like year or month? Via ranges queries. If the database has an index, it's pretty cheap for it to look up the start and end of a range (and nicely parallelizable, too). In our case, the range is >= startOfMonth
to < startOfNextMonth
. We can now build an in-process range table:
SELECT year, month,
calendarDate AS monthStart,
calendarDate + INTERVAL 1 MONTH AS nextMonthStart
FROM Calendar
WHERE dayOfMonth = 1
AND calendarDate >= :queryStartRange
AND calendarDate < :queryEndRange
... where the :
is denoting start-of-month values, which is left as an exercise for the reader.
Now, remember how I said "no functions"? calendarDate + INTERVAL 1 MONTH
actually counts. However, it's not going to matter here; the resulting table is so small (just 12 rows per year!) that good RDBMSs can place the contents in memory for faster results (because it would take longer to hit just the index).
Now that we have our range-query table, we can join it to the Orders
("fact") table;
SELECT DRange.year, DRange.month, SUM(Orders.quantity) AS total_quantity
FROM (SELECT year, month,
calendarDate AS monthStart,
calendarDate + INTERVAL 1 MONTH AS nextMonthStart
FROM Calendar
WHERE dayOfMonth = 1
AND calendarDate >= :queryStartRange
AND calendarDate < :queryEndRange) AS DRange
JOIN Orders
ON Orders.order_date >= DRange.monthStart
AND Orders.order_date < DRange.nextMonthStart
GROUP BY DRange.year, DRange.month
ORDER BY DRange.year, DRange.month
Example Fiddle
(fun trick: using LEFT JOIN
instead of JOIN
will net you null-quantity rows if a month has no orders - like for March and April in your example data)
So what does this get us? Range-query access at the base data, which will make for a faster query. If, for some reason, order_date
gets turned into a timestamp, the query is completely safe - we'll correctly get all orders, and put them in the proper months.
Upvotes: 1
Reputation: 4844
Try this query
SELECT MONTHNAME(DATE(order_date)) AS dateinfo, SUM(quantity) AS total_sales
FROM orders
GROUP BY dateinfo
Upvotes: 0