Jessie Boone
Jessie Boone

Reputation: 67

Select ALL records from one table and display 0's for rows not matching a condition in another table

For the purposes of this question I have 2 tables.

  1. Requests: Timestamp, SessionID, Site, CampaignID, TagName
  2. Events: Timestamp, SessionID, CampaignID, TagName, InteractionID

Most of the fields are varchar and allow nulls because these are flat files imported from custom server logs and sometimes unexpected characters can leak into one field from another - I can't control that. My requirement is to provide a Site report detailing the Unique Sites, Ad Tag, & Count of times each ad(Tag) ran on each site, along with a count of user interactions with the ad, plus a count of clicks, which is defined as one of 3 interaction_id's (1,2,or 3).

Running a separate query to check the actual data in Requests:

SELECT Convert(CHAR(10), r.Timestamp,101) AS [Date]
    , r.Site, r.CampaignID
    , r.TagName, Count(r.TagName) AS TagCount

FROM Requests AS r
WHERE r.CampaignID = 1901 
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) , r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, r.TagName

yields:

Date        Site    CampaignID   TagName TagCount
09/01/2013  Site1   1901          Tag1    61
09/01/2013  Site1   1901          Tag2    3703
09/01/2013  Site1   1901          Tag3    279
09/01/2013  Site2   1901          Tag1    30
...

To get the count of interactions I tried...

SELECT CONVERT(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID, r.TagName, Count(r.TagName) As TagCount, Count(e.InteractionID) AS InteractionCount
FROM Requests AS r
LEFT JOIN Events as e ON r.SessionID = e.SessionID
WHERE r.CampaignID = 9101 
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) ,r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, e.TagName

yields:

Date        Site    CampaignID  TagName TagCount    InteractionCount
09/01/2013  Site1   9101        Tag1    130         78
09/01/2013  Site1   9101        Tag2    3805        352
09/01/2013  Site1   9101        Tag3    307         53
09/01/2013  Site2   9101        Tag1    30          0
...

My questions/issues. TagCount has gone up, that surprises me. I can understand how it would go down as a result of the join condition, but why up? It seems to me that the first query should be counting all available records, the second shouldn't find more to count. It must be something I don't understand about a left join. So how do I get TagCount from Requests so that it just counts all of them, and then get InteractionCount from Events for each Site & TagName. I'm not familiar enough with SQL to come up with a sub-query that will achieve the right result but I guess that's where the answer lies.

The last part of my problem is Clicks

Where e.InteractionID = '1' OR e.InteractionID = '2' OR e.InteractionID = '3'

restricts my result set to only those records which contain clicks, but I need the rest of the records from the left side, with 0's if there are no clicks and the sum of the counts of any that meet those conditions.

I hope I've been clear about the problems, and I appreciate your time and energy, Thanks in advance.

Upvotes: 3

Views: 705

Answers (2)

roman
roman

Reputation: 117636

if you timestamp is unique in groups, then:

select
    convert(char(10), r.Timestamp, 101) as [Date],
    r.Site, r.CampaignID, r.TagName,
    count(distinct r.SID) as TagCount,
    count(e.InteractionID) as InteractionCount,
    count(case when e.InteractionID in (1, 2, 3) then e.InteractionID end) as ClicksCount
from Requests as r
    left outer join Events e on e.SessionID = r.SessionID
where r.CampaignID = 9101 
group by
    convert(char(10), r.Timestamp, 101), r.Site, r.CampaignID, r.TagName
order by [Date], Site, CampaignID, TagName;

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1271231

The reason tag count is going up is because you are counting rows, not distinct values. You can fix this by using count(distinct). The question is what is the right argument. Obviously, tagName isn't the right argument, because you are grouping by that field. A request Id would work, but it is not available. This might work:

SELECT CONVERT(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID, r.TagName,
       Count(distinct r.sID) As TagCount, Count(e.InteractionID) AS InteractionCount
FROM Requests r LEFT JOIN
     Events e
     ON r.SessionID = e.SessionID
WHERE r.CampaignID = 9101 
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) ,r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, e.TagName;

In other words, when you do the join, you are getting duplicated rows from the left join. These are being counting as multiples.

If the above doesn't work, you need to do the calculation as a subquery, then do the join:

select r.[Date], r.Site, r.CampaignID, r.TagName, r.TagCount,
       count(e.InteractionID) as InteractionCount
from (SELECT Convert(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID
             r.TagName, Count(r.TagName) AS TagCount
      FROM Requests r
      WHERE r.CampaignID = 1901 
      GROUP BY CONVERT(CHAR(10), r.Timestamp, 101), r.Site, r.CampaignID, r.TagName
     ) r join
     events e
     ON r.SessionID = e.SessionID
ORDER BY [Date], r.Site, r.CampaignID, r.TagName;

Upvotes: 1

Related Questions