Reputation: 5803
Note: This questions seems to have gotten decent views so I thought it's better I update this question for clarity. Most changes are cosmetic, but the only major change is that I added month column to the weights_table. Weights tables are monthly tables so technically it doesn't matter, but I guess having month column in both tables would make table relationship more obvious and logical
Problem
I have this query that uses two tables [person_table] and [weights_table].
select
a.month,
a.movie,
count(a.person_id) as raw,
sum(b.weight) as weighted,
sum(b.weight)/sum(sum(b.weight)) over () as share -- I need to change this calculation
from (select distinct month,
movie,
person_id
from person_table) a
join weights_table b on a.month=b.month and a.person_id=b.person_id
group by a.month, a.movie;
What I want is to change that last calculation such that the denominator sum(sum(b.weight)) over ()
is calculated as sum of the sum of weights for (distinct person_ids per month) instead of sum of the sum of weights for (distinct person_ids per movie per month). Is there an easier way to accommodate that without adding another subquery?
Sample of person_table
+-------+-------+-----------+
| month | movie | person_id |
+-------+-------+-----------+
| 1 | a | 1 |
| 1 | b | 1 |
| 1 | b | 2 |
| 1 | a | 2 |
| 1 | c | 3 |
| 1 | d | 4 |
| 1 | a | 2 |
| 1 | c | 3 |
| 1 | a | 6 |
+-------+-------+-----------+
Sample of weights_table
+-------+-----------+--------+
| month | person_id | weight |
+-------+-----------+--------+
| 1 | 1 | 12 |
| 1 | 2 | 34 |
| 1 | 3 | 65 |
| 1 | 4 | 76 |
| 1 | 7 | 96 |
+-------+-----------+--------+
Expected result
+-------+-------+-----+----------+-------+
| month | movie | raw | weighted | share |
+-------+-------+-----+----------+-------+
| 1 | a | 2 | 46 | 0.25 | --(12+34)/(12+34+65+76)=0.25
| 1 | b | 2 | 46 | 0.25 |
| 1 | c | 1 | 64 | 0.35 |
| 1 | d | 1 | 76 | 0.41 |
+-------+-------+-----+----------+-------+
Metric definitions:
Raw: Count of all distinct person_ids per month per movie)
Weighted: Sum of weights of distinct person_ids per month per movie).
Share: Ratio of Weighted to (Sum of weights of distinct person_ids per month matched with the persons_table)
Upvotes: 1
Views: 2109
Reputation: 11
Maybe something like:
select a.month,
a.movie,
count(a.person_id) as raw,
sum(b.weight) as weighted,
100*weighted/c.ttl_weight as share
from (select distinct month, movie, person_id from person_table) a
inner join weights_table b on a.person_id=b.person_id
cross join (select sum(weight) as ttl_weight from weights_table w
where exists (select 1
from person_table p
where w.person_id=p.person_id)
) c
group by a.month, a.movie, c.ttl_weight
;
Upvotes: 0
Reputation: 25968
Ah, with only one months data in a the table, and decomposing the subselect to CTE's to see if I can see a pattern. I don't see any.. and thus it seems like a how you like your SQL thing (to me)
with person_table as (
select column1 as month, column2 as movie, column3 as person_id, column4 as unique_visit_id
from values (1, 'a', 1, 1),
(1, 'b', 1, 2),
(1, 'b', 2, 3),
(1, 'a', 2, 4),
(1, 'c', 3, 5),
(1, 'd', 4, 6),
(1, 'a', 2, 7),
(1, 'c', 3, 8),
(1, 'a', 6, 9)
), weight_table as (
select column1 as person_id, column2 as weight
from values (1, 12), (2, 34), (3, 65), (4, 76), (999,999)
), dis_month_people as (
select distinct month, person_id
from person_table
), month_share as (
select month, sum(weight) as total_weight
from dis_month_people dp
join weight_table w on dp.person_id = w.person_id
group by 1
), dis_month_movie_people as (
select distinct month, movie, person_id
from person_table
)
select t.* --, weighted, total_weight
,t.weighted/m.total_weight as share
from (
select
a.month,
a.movie,
count(a.person_id) as raw,
sum(b.weight) as weighted
from dis_month_movie_people a
join weight_table b on a.person_id = b.person_id
group by 1,2
) AS t
join month_share m on t.month = m.month
order by 1,2;
Upvotes: 1