JMax2012
JMax2012

Reputation: 373

SQL Left Join, Cross Join - Find a Missing Value

This should be really easy, but I just can't spot the answer (I'll blame it on the fact that I've been SQL coding for almost 14 hours now).

I have a table called report with a "Label" field and a "ReportYear" field. I know that I have three possible "Label" values, but not every label will have an entry for every year.

I can select a distinct label list and I get:

'Regular'
'Special'
'None'

Doing some manual data selection here, I know that 2011 and 2010 have 0 entries with the label 'Special', but I need a query to pull this information together, so that it comes out:

'Regular' - '2012' - '5'
'Regular' - '2011' - '2'
'Regular' - '2010' - '1'
'Special' - '2012' - '3'
'Special' - '2011' - '0'
'Special' - '2010' - '0'
'None' - '2012' - '10'
'None' - '2011' - '5'
'None' - '2010' - '2'

Hopefully that makes sense.

I know I can SELECT Count(*), Label FROM (SELECT DISTINCT Label FROM Report) t1 ... but then? LEFT JOIN Report t2 ON t1.Label=t2.Label? CROSS JOIN?

My brain is fried.

Help?

Upvotes: 1

Views: 537

Answers (1)

ErikE
ErikE

Reputation: 50271

SELECT
   L.Label,
   Y.ReportYear,
   ReportCount = Count(R.Label)
FROM
   (SELECT DISTINCT Label FROM dbo.Report) L
   CROSS JOIN (SELECT DISTINCT ReportYear FROM dbo.Report) Y
   LEFT JOIN dbo.Report R
      ON L.Label = R.Label
      AND Y.ReportYear = R.ReportYear
GROUP BY
   L.Label,
   Y.ReportYear

Now, this isn't really ideal because you're doing an entire table scan, twice, just to get the labels and years.

Unless I am misunderstanding things, it seems to me that you should normalize the Report table so it has a ReportLabelID column and then have a ReportLabel table with the distinct labels. Then you can put the ReportLabel table in place of the DISTINCT query above.

And you can also eliminate the ReportYear subquery by parameterizing the whole query to accept BeginYear and EndYear or something like that. Or, you could get the Min(ReportYear), Max(ReportYear) from the table, and assuming you have an index on the column it may be able to turn it into seeks (two separate queries might be needed to get this), then use a numbers table or on-the-fly numbers table to generate the sequence of years between them.

Once you make those two changes, the query will then perform significantly better.

Upvotes: 1

Related Questions