SQL - querying Entity-Attribute-Value (EAV) with missing attributes

I have database that is equivalent to following table:

id |  foo |  bar
---+------+-----
 1 |    5 |    6
 2 |    7 | NULL

but, unfortunately, implemented as Entity-Attribute-Value:

CREATE TABLE obj(id INTEGER NOT NULL PRIMARY KEY);
CREATE TABLE attrdef(id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(4));
CREATE TABLE attr(obj_id INTEGER NOT NULL, attrdef_id INTEGER NOT NULL, value INTEGER NOT NULL);

INSERT INTO obj VALUES(1);
INSERT INTO obj VALUES(2);

INSERT INTO attrdef VALUES(3, 'foo');
INSERT INTO attrdef VALUES(4, 'bar');

INSERT INTO attr VALUES(1,3,5);
INSERT INTO attr VALUES(1,4,6);
INSERT INTO attr VALUES(2,3,7);

I need to query that database to get data in "proper" form - like in example table. I have tried:

SELECT obj.id, foo.value, bar.value
FROM obj 
LEFT JOIN attr foo ON (obj.id = foo.obj_id)
LEFT JOIN attrdef foo_def ON (foo.attrdef_id = foo_def.id)
LEFT JOIN attr bar ON (obj.id = bar.obj_id)
LEFT JOIN attrdef bar_def ON (bar.attrdef_id = bar_def.id)
WHERE foo_def.name = 'foo' AND bar_def.name = 'bar';

but the second row is missing:

id |  foo |  bar
---+------+-----
 1 |    5 |    6

and

SELECT obj.id,
    MAX(CASE WHEN name='foo' THEN value ELSE NULL END) foo,
    MAX(CASE WHEN name='bar' THEN value ELSE NULL END) bar
FROM obj LEFT JOIN attr ON (obj.id = attr.obj_id)
LEFT JOIN attrdef ON (attr.attrdef_id = attrdef.id)
GROUP BY obj.id;

which gives correct result:

id |  foo |  bar
---+------+-----
 1 |    5 |    6
 2 |    7 | NULL

but performance of this query is unacceptable.

I'd like standard SQL query, but good MySQL-specific solution would be appreciated.

Upvotes: 1

Views: 618

Answers (2)

HLGEM
HLGEM

Reputation: 96570

When you do this in the where caluse:

AND bar_def.name = 'bar';

You convert that left join on bar_def to an inner join. Same thing with the condition you put on Foo_def.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269973

You just need to move the conditions to the on clauses:

SELECT obj.id, foo.value, bar.value
FROM obj LEFT JOIN
     attr foo
     ON obj.id = foo.obj_id LEFT JOIN
     attrdef foo_def
     ON foo.attrdef_id = foo_def.id AND foo_def.name = 'foo' LEFT JOIN
     attr bar
     ON obj.id = bar.obj_id LEFT JOIN
     attrdef bar_def
     ON bar.attrdef_id = bar_def.id AND bar_def.name = 'bar';

For the aggregation method, I would go with:

SELECT obj.id,
       MAX(CASE WHEN name = 'foo' THEN value END) foo,
       MAX(CASE WHEN name = 'bar' THEN value END) bar
FROM obj LEFT JOIN
     attr
     ON obj.id = attr.obj_id LEFT JOIN
     attrdef
     ON attr.attrdef_id = attrdef.id
WHERE name IN ('foo', 'bar')
GROUP BY obj.id;

The left joins might not be necessary in this case (depending on the distribution of missing values). In any case, if you start looking at more and more attributes, the JOIN method takes longer and longer. The GROUP BY method has about the same performance.

EDIT:

The correct query is:

SELECT obj.id, foo.value, bar.value
FROM obj LEFT JOIN
     (attr foo JOIN
      attrdef foo_def
      ON foo.attrdef_id = foo_def.id AND foo_def.name = 'foo'
     )
     ON obj.id = foo.obj_id LEFT JOIN
     (attr bar JOIN
      attrdef bar_def
      ON bar.attrdef_id = bar_def.id AND bar_def.name = 'bar'
     )
     ON obj.id = bar.obj_id ;

Here is the SQL Fiddle.

Upvotes: 3

Related Questions