Aron Rotteveel
Aron Rotteveel

Reputation: 83203

Combining SUM(), GROUP_BY() and LEFT_JOIN() returns incorrect results: how to fix?

I am writing a query that should return aggregated hours for multiple users per day/month/year.

The table looks something like this:

+------------------------------------------+
| id | entity_id | minutes | person | date |
+------------------------------------------+

How output should look:

+----------------------------+
| year | month | day | hours |
| 2008 | 12    | 1   | 30    |
| 2008 | 12    | 2   | 40    |
| 2008 | 12    | 3   | 23    |
+----------------------------+

Instead, the hours are often a lot more due to the returned rows caused by the left join.

The problem is that I need to query this table based on the tags that are linked to the corresponding entities. When I join the two tables (tag_entity that provides the link and tags that provides the actual tag names) my SUM() no longer works since there are too many results being returned.

The query:

select 
    date_format(from_unixtime(date), '%Y-%m-%d') as myDate,
    ROUND(SUM(time) / 60,1) as hours

from time h

left join tag_entity te on te.entity_id = h.entity_id
left join tags t on t.tag_id = te.tag_id

where (t.tag_name NOT IN ('foo', 'bar', 'baz') OR t.tag_name IS NULL) 

group by
    myDate

order by
    hours DESC, myDate ASC

How can I fix this?

EDIT:

Here are the schemas for tag and tag_entity:

Tag:

+----------+-------------+
| Field    | Type        |
+----------+-------------+
| tag_id   | int(11)     |
| tag_name | varchar(50) |
+----------+-------------+

And tag_entity:

+-----------+---------+
| Field     | Type    |
+-----------+---------+
| id        | int(11) |
| tag_id    | int(11) |
| entity_id | int(11) |
+-----------+---------+

Upvotes: 1

Views: 303

Answers (3)

The Scrum Meister
The Scrum Meister

Reputation: 30131

GROUP BY groups the results, not the table rows individually.

Based on your comment only return rows in the time table that are not linked to one of those tags:

SELECT 
    date_format(from_unixtime(date), '%Y-%m-%d') as myDate,
    ROUND(SUM(time) / 60,1) as hours
FROM `time` h
  LEFT JOIN (
    SELECT DISTINCT te.entity_id
    FROM tag_entity te
      LEFT JOIN tags t on t.tag_id = te.tag_id
    WHERE te.entity_id IS NOT NULL AND t.tag_name IN ('foo', 'bar', 'baz')
  ) g ON h.entity_id = g.entity_id
WHERE g.entity_id IS NULL
group by
    myDate

order by
    hours DESC, myDate ASC

Upvotes: 2

jswolf19
jswolf19

Reputation: 2303

Probably you want something like this:

select 
    date_format(from_unixtime(date), '%Y-%m-%d') as myDate,
    ROUND(SUM(time) / 60,1) as hours

from time h

left join tag_entity te on te.entity_id = h.entity_id

where NOT EXISTS(select te.entity_id

                 from tag_entity te

                 join tags t on t.tag_id = te.tag_id

                 where te.tag_entity = h.entity_id  AND t.tag_name IN ('foo', 'bar', 'baz')) 

group by
    myDate

order by
    hours DESC, myDate ASC

Upvotes: 0

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125444

select 
    date_format(from_unixtime(date), '%Y-%m-%d') as jaar,
    ROUND(SUM(time) / 60,1) as uren

from time h

left join tag_entity te on te.entity_id = h.entity_id
left join tags t on t.tag_id = te.tag_id

where (t.tag_name NOT IN ('foo', 'bar', 'baz') OR t.tag_name IS NULL) 

group by jaar

order by
    uren DESC, jaar ASC

Upvotes: 0

Related Questions