Reputation: 67
For the purposes of this question I have 2 tables.
Most of the fields are varchar and allow nulls because these are flat files imported from custom server logs and sometimes unexpected characters can leak into one field from another - I can't control that. My requirement is to provide a Site report detailing the Unique Sites, Ad Tag, & Count of times each ad(Tag) ran on each site, along with a count of user interactions with the ad, plus a count of clicks, which is defined as one of 3 interaction_id's (1,2,or 3).
Running a separate query to check the actual data in Requests:
SELECT Convert(CHAR(10), r.Timestamp,101) AS [Date]
, r.Site, r.CampaignID
, r.TagName, Count(r.TagName) AS TagCount
FROM Requests AS r
WHERE r.CampaignID = 1901
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) , r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, r.TagName
yields:
Date Site CampaignID TagName TagCount
09/01/2013 Site1 1901 Tag1 61
09/01/2013 Site1 1901 Tag2 3703
09/01/2013 Site1 1901 Tag3 279
09/01/2013 Site2 1901 Tag1 30
...
To get the count of interactions I tried...
SELECT CONVERT(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID, r.TagName, Count(r.TagName) As TagCount, Count(e.InteractionID) AS InteractionCount
FROM Requests AS r
LEFT JOIN Events as e ON r.SessionID = e.SessionID
WHERE r.CampaignID = 9101
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) ,r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, e.TagName
yields:
Date Site CampaignID TagName TagCount InteractionCount
09/01/2013 Site1 9101 Tag1 130 78
09/01/2013 Site1 9101 Tag2 3805 352
09/01/2013 Site1 9101 Tag3 307 53
09/01/2013 Site2 9101 Tag1 30 0
...
My questions/issues. TagCount has gone up, that surprises me. I can understand how it would go down as a result of the join condition, but why up? It seems to me that the first query should be counting all available records, the second shouldn't find more to count. It must be something I don't understand about a left join. So how do I get TagCount from Requests so that it just counts all of them, and then get InteractionCount from Events for each Site & TagName. I'm not familiar enough with SQL to come up with a sub-query that will achieve the right result but I guess that's where the answer lies.
The last part of my problem is Clicks
Where e.InteractionID = '1' OR e.InteractionID = '2' OR e.InteractionID = '3'
restricts my result set to only those records which contain clicks, but I need the rest of the records from the left side, with 0's if there are no clicks and the sum of the counts of any that meet those conditions.
I hope I've been clear about the problems, and I appreciate your time and energy, Thanks in advance.
Upvotes: 3
Views: 705
Reputation: 117636
if you timestamp is unique in groups, then:
select
convert(char(10), r.Timestamp, 101) as [Date],
r.Site, r.CampaignID, r.TagName,
count(distinct r.SID) as TagCount,
count(e.InteractionID) as InteractionCount,
count(case when e.InteractionID in (1, 2, 3) then e.InteractionID end) as ClicksCount
from Requests as r
left outer join Events e on e.SessionID = r.SessionID
where r.CampaignID = 9101
group by
convert(char(10), r.Timestamp, 101), r.Site, r.CampaignID, r.TagName
order by [Date], Site, CampaignID, TagName;
Upvotes: 2
Reputation: 1271231
The reason tag count is going up is because you are counting rows, not distinct values. You can fix this by using count(distinct)
. The question is what is the right argument. Obviously, tagName
isn't the right argument, because you are grouping by that field. A request Id would work, but it is not available. This might work:
SELECT CONVERT(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID, r.TagName,
Count(distinct r.sID) As TagCount, Count(e.InteractionID) AS InteractionCount
FROM Requests r LEFT JOIN
Events e
ON r.SessionID = e.SessionID
WHERE r.CampaignID = 9101
GROUP BY CONVERT(CHAR(10), r.Timestamp,101) ,r.Site, r.CampaignID, r.TagName
ORDER BY [Date], r.Site, r.CampaignID, e.TagName;
In other words, when you do the join, you are getting duplicated rows from the left join
. These are being counting as multiples.
If the above doesn't work, you need to do the calculation as a subquery, then do the join:
select r.[Date], r.Site, r.CampaignID, r.TagName, r.TagCount,
count(e.InteractionID) as InteractionCount
from (SELECT Convert(CHAR(10), r.Timestamp,101) AS [Date], r.Site, r.CampaignID
r.TagName, Count(r.TagName) AS TagCount
FROM Requests r
WHERE r.CampaignID = 1901
GROUP BY CONVERT(CHAR(10), r.Timestamp, 101), r.Site, r.CampaignID, r.TagName
) r join
events e
ON r.SessionID = e.SessionID
ORDER BY [Date], r.Site, r.CampaignID, r.TagName;
Upvotes: 1