PaulStock
PaulStock

Reputation: 11263

Is there more efficient way of grouping by date in SQL Server 2008

I have to create a report which has AccountSegment as rows and a 2-week date range as column header. The column values will be a count of the number of records in the table having the associated segment/date range.

So the desired output looks something like this:

AcctSeg  4/9/12-4/20/12   4/23/12-5/4/12   5/7/12-5/18/12
Segment1       100             200              300
Segment2       110             220              330
Segment3       120             230              340

The following query does what I want, but just looks so inefficient and ugly. I was wondering if there is a better way to accomplish the same thing:

SELECT
    AccountSegment = S.Segment_Name,
    '4/9/2012 - 4/20/2012' = SUM(CASE WHEN date_start BETWEEN '2012-04-09' AND '2012-04-20' THEN 1 END),
    '4/23/2012 - 5/4/2012' = SUM(CASE WHEN date_start BETWEEN '2012-04-23' AND '2012-05-04' THEN 1 END),
    '5/7/2012 - 5/18/2012' = SUM(CASE WHEN date_start BETWEEN '2012-05-07' AND '2012-05-18' THEN 1 END),
    '5/21/2012 - 6/1/2012' = SUM(CASE WHEN date_start BETWEEN '2012-05-21' AND '2012-06-01' THEN 1 END),
    '6/4/2012 - 6/15/2012' = SUM(CASE WHEN date_start BETWEEN '2012-06-04' AND '2012-06-15' THEN 1 END),
    '6/18/2012 - 6/29/2012' = SUM(CASE WHEN date_start BETWEEN '2012-06-18' AND '2012-06-29' THEN 1 END),
    '7/2/2012 - 7/13/2012' = SUM(CASE WHEN date_start BETWEEN '2012-07-02' AND '2012-07-13' THEN 1 END),
    '7/16/2012 - 7/27/2012' = SUM(CASE WHEN date_start BETWEEN '2012-07-16' AND '2012-07-27' THEN 1 END),
    '7/30/2012 - 8/10/2012' = SUM(CASE WHEN date_start BETWEEN '2012-07-30' AND '2012-08-10' THEN 1 END)
FROM
    dbo.calls C
    JOIN dbo.accounts a ON C.parent_id = a.id
    JOIN dbo.accounts_cstm a2 ON a2.id_c = A.id
    JOIN dbo.Segmentation S ON a2.[2012_segmentation_c] = S.Segment_Num
WHERE
    c.deleted = 0 
GROUP BY
    S.Segment_Name
ORDER BY
    MIN(S.Sort_Order)

Upvotes: 2

Views: 140

Answers (2)

Dale M
Dale M

Reputation: 2473

@PaulStock, happy to do so.

This technique plays to the strengths of RDMS which is data retrieval and set manipulation - leave itteration to other programming languages that are better optimised for it like C#.

First of all you need an IndexTable, I keep mine in the master database but if you do not have write access to this by all means keep it in your db.

It looks like this:

Id
 0
 1
 2
...
n

Where n is a sufficiently large number for all your scenarios, 100,000 is good, 1,000,000 is better, 10,000,000 is even better still. Column id is cluster indexed of course.

I'm not going to relate it directly to your query becuase I don't really get it and I'm too lazy to work it out.

Instead I'll relate it to this table called Transactions, where we want to roll up all the transactions that happened on each day (or week or month etc):

Date                       Amount
2012-18-12 04:58:56.453       10
2012-18-12 06:34:21.456      100
etc

The following query will roll up the data by day

SELECT  i.Id, SUM(t.Amount) AS DailyTotal
FROM    IndexTable i
        INNER JOIN
        Transactions t ON i.Id=DATEDIFF(DAY, 0, t.Date)
GROUP BY i.Id

The DATEDIFF function returns the number of dateparts between 2 dates, in this case the number of days between 1900-01-01 0:00:00.000 (DateTime = 0 in SQL Server) and the Date of the transaction (btw there have been 41,261 days since then - see why we need a big table)

All the transactions on the same day will have the same number. Changing to week or month or second (a very big number) is as easy as changing the datepart.

You can put in a startdate later than this of course so long as it is earlier than the data you are interested in but it makes litte to no differance to performance.

I have used an INNER JOIN here so if there are no transactions on a given day then we have no row but a LEFT JOIN will give these empty dates with NULL as the Total (use an ISNULL statement if you want to get 0.

With the normalised data you can then PIVOT as desired to get the output you are looking for.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269753

It looks fine, but I would suggest one performance improvement:

where c.deleted = 0 and
      date_start between  '2012-04-09' AND '2012-08-10'

This will limit the aggregation only to rows you need . . . unless you want everything listed with empty data.

I would be inclined to add else 0 to the case statements, so 0s appear instead of NULLs.

Upvotes: 2

Related Questions