Reputation: 4742
I wish to join two tables, but get average of the joined column only by left table occurrences
document:
+-----+-----+-------+
| dId | name| score |
+-----+-----+-------+
| A | n1 | 100 |
| B | n1 | 70 |
+-----+-----+-------+
entity:
+------+------------+-----+
| ename| details | dId |
+------+------------+-----+
| e1 | a | A |
| e2 | a | A |
| e3 | b | A |
| e4 | c | B |
+------+------------+-----+
Expected Output:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 85 |e1, e2, e3, e4 |
+------+--------+---------------+
Because (100+70)/2 = 85
Current output:
+------+--------+---------------+
| name | average| entities |
+------+--------+---------------+
| n1 | 92.5 |e1, e2, e3, e4 |
+------+--------+---------------+
Because (100+100+100+70)/4 = 92.5
Current Query:
SELECT
docT.name,
AVG(docT.score),
STRING_AGG(entityT.ename)
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname
How can I get the score as in the expected output?
Upvotes: 0
Views: 146
Reputation: 1269803
This is tricky. I think window functions might be the simplest solution:
SELECT docT.name, docT.avg_score,
STRING_AGG(entityT.ename)
FROM (SELECT docT.*,
AVG(docT.score) OVER (PARTITION BY docT.cname) as avg_score
FROM document_sentiment docT
) docT JOIN
entity_sentiment entityT
ON docT.dId = entityT.dId
GROUP BY docT.cname, docT.avg_score;
Why is this tricky? Well, if you aggregate by cname
, then you lose dId
and cannot do the JOIN
.
Pre-aggregation doesn't do the trick. Happily, that is solved using window functions.
Upvotes: 0
Reputation: 799
Try this
select t.name, av,
GROUP_CONCAT(DISTINCT entityT.name ORDER BY entityT.name SEPARATOR ', ') AS entities
from (
SELECT docT.dId, docT.name,
AVG(docT.score) av
FROM document_sentiment docT
GROUP BY docT.name) T
JOIN entity_sentiment entityT ON T.dId = entityT.dId
GROUP BY T.name
Upvotes: 1
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.document_sentiment` AS (
SELECT 'A' dId, 'n1' name, 100 score UNION ALL
SELECT 'B', 'n1', 70
), `project.dataset.entity_sentiment` AS (
SELECT 'e1' ename, 'a' details, 'A' dId UNION ALL
SELECT 'e2', 'a', 'A' UNION ALL
SELECT 'e3', 'b', 'A' UNION ALL
SELECT 'e4', 'c', 'B'
)
SELECT
docT.name,
AVG(docT.score) average,
STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
SELECT dId, STRING_AGG(ename) ename
FROM `project.dataset.entity_sentiment`
GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name
Row name average entities
1 n1 85.0 e1,e2,e3,e4
Upvotes: 1
Reputation: 1109
Try the below code
select name, ename, avg(score) as score
from (SELECT
docT.name,
doct.Did,
MAX(docT.score) as score,
STRING_AGG(entityT.ename) as ename
FROM
document_sentiment docT
JOIN
entity_sentiment entityT
ON
docT.dId = entityT.dId
GROUP BY
docT.cname, doct.Did
) sub
group by name, ename
Upvotes: 1