Reputation: 4742

SQL - join two tables, but get average of the joined column only by left table occurrences

I wish to join two tables, but get average of the joined column only by left table occurrences

document:

+-----+-----+-------+
| dId | name| score |
+-----+-----+-------+
| A   | n1  | 100   |
| B   | n1  | 70    |
+-----+-----+-------+

entity:

+------+------------+-----+
| ename| details    | dId |
+------+------------+-----+
| e1   | a          |   A |
| e2   | a          |   A |
| e3   | b          |   A |
| e4   | c          |   B |
+------+------------+-----+

Expected Output:

+------+--------+---------------+
| name | average| entities      |
+------+--------+---------------+
| n1    | 85    |e1, e2, e3, e4 |
+------+--------+---------------+

Because (100+70)/2 = 85

Current output:

+------+--------+---------------+
| name | average| entities      |
+------+--------+---------------+
| n1    | 92.5  |e1, e2, e3, e4 |
+------+--------+---------------+

Because (100+100+100+70)/4 = 92.5

Current Query:

SELECT
  docT.name,
  AVG(docT.score),
  STRING_AGG(entityT.ename)
FROM
  document_sentiment docT
JOIN
  entity_sentiment entityT
ON
  docT.dId = entityT.dId
GROUP BY
  docT.cname

How can I get the score as in the expected output?

Upvotes: 0

Answers (4)

Gordon Linoff

Reputation: 1269803

This is tricky. I think window functions might be the simplest solution:

SELECT docT.name, docT.avg_score,
       STRING_AGG(entityT.ename)
FROM (SELECT docT.*,
             AVG(docT.score) OVER (PARTITION BY docT.cname) as avg_score
      FROM document_sentiment docT
     ) docT JOIN
     entity_sentiment entityT
    ON docT.dId = entityT.dId
GROUP BY docT.cname, docT.avg_score;

Why is this tricky? Well, if you aggregate by cname, then you lose dId and cannot do the JOIN.

Pre-aggregation doesn't do the trick. Happily, that is solved using window functions.

Upvotes: 0

Jahirul Islam Bhuiyan

Reputation: 799

Try this

select  t.name, av,  
    GROUP_CONCAT(DISTINCT entityT.name ORDER BY entityT.name SEPARATOR ', ') AS entities
from (
    SELECT docT.dId, docT.name,
          AVG(docT.score) av
    FROM document_sentiment docT
    GROUP BY docT.name) T
JOIN entity_sentiment entityT ON T.dId = entityT.dId
GROUP BY T.name

SQL Fiddle

Upvotes: 1

Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT
  docT.name,
  AVG(docT.score) average,
  STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
  SELECT dId, STRING_AGG(ename) ename
  FROM `project.dataset.entity_sentiment`
  GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name

You can test, play with above using sample data from your question as in example below

#standardSQL
WITH `project.dataset.document_sentiment` AS (
  SELECT 'A' dId, 'n1' name, 100 score UNION ALL
  SELECT 'B', 'n1', 70 
), `project.dataset.entity_sentiment` AS (
  SELECT 'e1' ename, 'a' details, 'A' dId UNION ALL
  SELECT 'e2', 'a', 'A' UNION ALL
  SELECT 'e3', 'b', 'A' UNION ALL
  SELECT 'e4', 'c', 'B' 
)
SELECT
  docT.name,
  AVG(docT.score) average,
  STRING_AGG(entityT.ename) entities
FROM `project.dataset.document_sentiment` docT
JOIN (
  SELECT dId, STRING_AGG(ename) ename
  FROM `project.dataset.entity_sentiment`
  GROUP BY dId
) entityT
ON docT.dId = entityT.dId
GROUP BY docT.name  

Row name    average     entities     
1   n1      85.0        e1,e2,e3,e4

Upvotes: 1

Gaurav

Reputation: 1109

Try the below code

select name, ename, avg(score) as score
from (SELECT
  docT.name,
  doct.Did,
  MAX(docT.score) as score,
  STRING_AGG(entityT.ename) as ename
FROM
  document_sentiment docT
JOIN
  entity_sentiment entityT
ON
  docT.dId = entityT.dId
GROUP BY
  docT.cname, doct.Did
) sub
group by name, ename

Upvotes: 1

SQL - join two tables, but get average of the joined column only by left table occurrences

Answers (4)

Related Questions