Brisi
Brisi

Reputation: 1821

How to write a hive script to return the data in the expected format

impression  lineid1 cookieid1
impression  lineid1 cookieid2
impression  lineid1 cookieid3
impression  lineid1 cookieid4
impression  lineid1 cookieid1
impression  lineid2 cookieid1
impression  lineid2 cookieid2
impression  lineid3 cookieid1
impression  lineid1 cookieid1
impression  lineid3 cookieid3
impression  lineid4 cookieid1
impression  lineid4 cookieid2
impression  lineid1 cookieid1
impression  lineid4 cookieid3
impression  lineid4 cookieid4
conversion      cookieid1
conversion      cookieid2
conversion      cookieid3
conversion      cookieid3

All events(ie,impression, conversion) have cookie_ids. For all events in "impression", I would like to know how many times those cookie_id(s) occur in conversion.

I want the o/p like this:

line_id1    cookie_id1  count=1
line_id1    cookie_id2  count=1
line_id1    cookie_id3  count=1
line_id2    cookie_id1  count=1
line_id2    cookie_id2  count=1

How to write a hive script to return the data in the expected format

Upvotes: 0

Views: 187

Answers (1)

Lukas Vermeer
Lukas Vermeer

Reputation: 5940

Join the table on itself by cookieid. Filter lines where the left row is an impression and the right row a conversion. Group by lineid and cookieid to get the counts.

select lineid, cookieid, count(*) as count
from table t1 join table t2
on t1.cookieid = t2.cookieid
where t1.event = 'impression'
and t2.event = 'conversion'
group by lineid, cookieid

This query will not return combinations for which there is no conversion.

Hope that helps.

Upvotes: 2

Related Questions