Treat missing rows as having 0 data

Question

Suppose I have a table describing workers' contribution to projects

project worker contribution
1       1      2
1       2      3
2       1      4

To compute the worker impact, I can do

select t.project, t.worker, t.contribution, p.total, 
       t.contribution / p.total as relative 
from my_table t
join (select project, sum(contribution) as total
      from my_table group by project) p
on t.project = p.project

to get new_table

project worker contribution total relative
1       1      1            5     .2
1       2      4            5     .8
2       1      4            4     1

if I now compute the average relative contribution using

select worker, avg(relative) as avg_rel
from new_table group by worker

I will see

worker avg_rel
1      .6
2      .8

which disregards the 0 contribution of worker 2 to project 2.

How do I take that into account? I.e., I want to get

worker avg_rel
1      .6
2      .4

as if the original table also included the line

2       2      0

Thanks.

Gordon Linoff · Accepted Answer

I'm not sure this will work in Hive, but here is a SQL solution:

select w.worker, avg(coalesce(t.relative, 0.0)) as avg_rel
from (select distinct project from my_table) p cross join
     (select distinct worker from my_table) w left outer join
     my_table t
     on t.project = p.project and t.worker = w.worker
group by w.worker;

Treat missing rows as having 0 data

Answers (2)

Related Questions