gad_gadskiy
gad_gadskiy

Reputation: 220

group_concat works incorrect on multiple join

I have a problem with building select request with joining and concatenation data from 3 tables.

First table entity has ids of some entities with their authors:

id author
11 "John"
12 "Mike"
13 "Kevin"

2nd and 3rd tables have sections and files related to this entities, one row for each. Count of sections and files can be different for any entity.

file:

id entity_id file_name
1 11 file1
2 12 file1
3 12 file2
4 12 file3
5 13 file4
6 13 file5
7 13 file6

section (also some entities can be w/o sections like 12 in this example):

id entity_id section_id
1 11 1001
2 11 1002
3 13 1003

I need to select all data from entities table joining related sections and files as comma-separated strings. For this purpose I've created following request:

SELECT 
    entity.id, 
    entity.author, 
    group_concat(section.section_id) section_ids, 
    group_concat(file.file_name) files 
FROM entity
LEFT JOIN file ON entity.id = file.entity_id
LEFT JOIN section ON entity.id = section.entity_id
group by entity.id;

I'm expecting to get following result:

id author files section_ids
11 "John" file1 1001,1002
12 "Mike" file1,file2,file3 null
13 "Kevin" file4,file5,file6 1003

But actually I'm getting this one:

id author files section_ids
11 "John" file1,file1 1001,1002
12 "Mike" file1,file2,file3 null
13 "Kevin" file4,file5,file6 1003,1003,1003

Looks like files are duplicated where entity has multiple sections and sections are duplicated when entity has multiple files. I tried to play with different types of join (inner/outher, right/left) but didn't find any solution. Please help me to fix this query.

Upvotes: 1

Views: 467

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1270773

You are joining against two different dimensions, resulting in a Cartesian product. The simplest adjustment to your query is DISTINCT:

SELECT e.id, e.author, 
      group_concat(distinct s.section_id) as section_ids, 
      group_concat(f.file_name) as files 
FROM entity e LEFT JOIN
     file f
     ON e.id = f.entity_id LEFT JOIN
     section s
     ON e.id = s.entity_id
group by e.id;

However, for performance, I would recommend the approach that GMB suggests.

Upvotes: 0

GMB
GMB

Reputation: 222632

The problem is when you have multiple matchs in both tables for a given entity: the joins multiply the rows, and the results of the aggregates are wrong.

I would recommend pre-aggregation. A couple of subqueries should do the job just fine:

select e.id, e.author, 
    (select group_concat(f.file_name) from file f where f.entity_id = e.id) as files_names
    (select group_concat(s.section_id) from section s where s.entity_id = e.id) as section_ids
from entity e

Upvotes: 3

Related Questions