Reputation: 1821
impression lineid1 cookieid1
impression lineid1 cookieid2
impression lineid1 cookieid3
impression lineid1 cookieid4
impression lineid1 cookieid1
impression lineid2 cookieid1
impression lineid2 cookieid2
impression lineid3 cookieid1
impression lineid1 cookieid1
impression lineid3 cookieid3
impression lineid4 cookieid1
impression lineid4 cookieid2
impression lineid1 cookieid1
impression lineid4 cookieid3
impression lineid4 cookieid4
conversion cookieid1
conversion cookieid2
conversion cookieid3
conversion cookieid3
All events(ie,impression, conversion) have cookie_ids. For all events in "impression", I would like to know how many times those cookie_id(s) occur in conversion.
I want the o/p like this:
line_id1 cookie_id1 count=1
line_id1 cookie_id2 count=1
line_id1 cookie_id3 count=1
line_id2 cookie_id1 count=1
line_id2 cookie_id2 count=1
How to write a hive script to return the data in the expected format
Upvotes: 0
Views: 187
Reputation: 5940
Join the table on itself by cookieid. Filter lines where the left row is an impression and the right row a conversion. Group by lineid and cookieid to get the counts.
select lineid, cookieid, count(*) as count
from table t1 join table t2
on t1.cookieid = t2.cookieid
where t1.event = 'impression'
and t2.event = 'conversion'
group by lineid, cookieid
This query will not return combinations for which there is no conversion.
Hope that helps.
Upvotes: 2