AME
AME

Reputation: 2609

What is the best way to model an entity that can have different attributes?

Let's say I am programming a MMORPG. I modeled a Entity character that can have a multitude of attributes like coating, strength, color and so on. Because I do not know these attributes in advance (what and how many of them), I thought I create an extra table for it, like so:

CREATE TABLE character (INTEGER id, VARCHAR name, INTEGER player_id);

and

CREATE TABLE attributes (INTEGER character_id, VARCHAR key, VARCHAR value);

I would then be able to introduce a multitude of new attributes. However, how would I query this construct? The query

SELECT * FROM character JOIN attributes ON character.id=attributes.character_id;

will obvioulsy only work for a single attribute. Do I have to JOIN the attributes table more than once or is there another solution?

Is there a way to have different types for the attribute.value Part? Doing it the way I am doing now would limit me to a VARCHAR representation.

Upvotes: 2

Views: 282

Answers (2)

user731136
user731136

Reputation:

Another possibility is to use hstore instead of EAV model.

CREATE TABLE character (id INTEGER, name VARCHAR,
                        player_id INTEGER, attributes hstore);

That way you can store the attributes as a map (key - value).

insert into character (id, name, player_id, attributes)
values (1, 'test', 1, '"attribute1"=>"value1","attribute2"=>"value2"')
      ,(2, 'test', 1, '"attribute1"=>"value1","attribute3"=>"value3"');

select (each(attributes)).key, (each(attributes)).value 
from character where id = 1;

  key text    value text
  --------------------------
   attribute1   value1
   attribute2   value2

select id, attributes->'attribute3' as value 
from character WHERE exist(attributes,'attribute3');

  id    value
  ---------------
   2   "value3"

Hope this helps.

UPDATE

I made a small benchmark to compare hstore vs two tables.

CREATE OR REPLACE FUNCTION create_dummy_data()
RETURNS integer AS
$BODY$
DECLARE
   cont1       INTEGER;
   cont2       INTEGER;
   sqlInsert   VARCHAR;

BEGIN
   CREATE TABLE character (id INTEGER PRIMARY KEY
                          ,name VARCHAR
                          ,player_id INTEGER);

   CREATE TABLE attributes (character_id INTEGER
                           ,key VARCHAR
                           ,value VARCHAR
                           ,FOREIGN KEY (character_id) REFERENCES character);

   cont1 := 1;
   WHILE cont1 < 10000 LOOP
      sqlInsert := 'INSERT INTO character (id, name, player_id) VALUES (' || cont1 || ', ''character' || cont1 || ''', ' || cont1 || ');';
      EXECUTE sqlInsert;
      cont1 := cont1 + 1;
   END LOOP;

   cont1 := 1;
   WHILE cont1 < 10000 LOOP
      cont2 := 1;
      WHILE cont2 < 10 LOOP   
         sqlInsert := 'INSERT INTO attributes (character_id, key, value) VALUES (' || cont1 || ', ''key' || cont2 || ''', ' || cont2 || ');';
         EXECUTE sqlInsert;
         cont2 := cont2 + 1;
      END LOOP;       
      cont1 := cont1 + 1;
    END LOOP;

    CREATE TABLE character_hstore (id INTEGER
                                  ,name VARCHAR
                                  ,player_id INTEGER
                                  ,attributes hstore);
    cont1 := 1;
    WHILE cont1 < 10000 LOOP
       sqlInsert := 'INSERT INTO character_hstore (id, name, player_id, attributes) VALUES (' || cont1 || ', ''character' || cont1 || ''', ' || cont1 || ', ''"key1"=>"1","key2"=>"2","key3"=>"3","key4"=>"4","key5"=>"5"'');';
       EXECUTE sqlInsert;
       cont1 := cont1 + 1;
    END LOOP;   

    RETURN 1;
 END;
 $BODY$
 LANGUAGE plpgsql;

 select * from create_dummy_data();

 DROP FUNCTION create_dummy_data();

And I've got the following results:

explain analyze
SELECT ca.* 
FROM character ca
JOIN attributes at ON ca.id = at.character_id
WHERE at.value = '1';
"Hash Join  (cost=288.98..2152.77 rows=10076 width=21) (actual time=2.788..23.186 rows=9999 loops=1)"

CREATE INDEX ON attributes (value);

explain analyze
SELECT ca.* 
FROM character ca
JOIN attributes at ON ca.id = at.character_id
WHERE at.value = '1';
"Hash Join  (cost=479.33..1344.18 rows=10076 width=21) (actual time=4.330..13.537 rows=9999 loops=1)"

And using hstore:

explain analyze
SELECT * 
FROM character_hstore
WHERE attributes @> 'key1=>1';
"Seq Scan on character_hstore  (cost=0.00..278.99 rows=10 width=91) (actual time=0.012..3.530 rows=9999 loops=1)"

explain analyze
SELECT * 
FROM character_hstore
WHERE attributes->'key1' = '1';
"Seq Scan on character_hstore  (cost=0.00..303.99 rows=50 width=91) (actual time=0.016..4.806 rows=9999 loops=1)"

Upvotes: 4

Jens Schauder
Jens Schauder

Reputation: 81862

What you are describing is an Entity-Attribute-Value model. Google EAV, you'll find lots of information about it.

And yes querying is the big drawback of this kind of query. Selects get very large, complex and possibly slow.

As @doctore describes there are database specific features available:

CREATE TABLE character (INTEGER id, VARCHAR name,
                    INTEGER player_id, attributes hstore);

You can read more about hstore, follow the link.

If this entity is central to your application you might want to considere an dedicated key value store for this kind of data. For example CouchDB

Upvotes: 2

Related Questions