Reputation: 826

How to JOIN M:M table the right way?

My database structure:

CREATE TABLE categories (
    name VARCHAR(30) PRIMARY KEY
);

CREATE TABLE additives (
    name VARCHAR(30) PRIMARY KEY
);

CREATE TABLE beverages (
    name VARCHAR(30) PRIMARY KEY,
    description VARCHAR(200),
    price NUMERIC(5, 2) NOT NULL CHECK (price >= 0),
    category VARCHAR(30) NOT NULL REFERENCES categories(name) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE b_additives_xref (
    bname VARCHAR(30) REFERENCES beverages(name) ON DELETE CASCADE ON UPDATE CASCADE,
    aname VARCHAR(30) REFERENCES additives(name) ON DELETE CASCADE ON UPDATE CASCADE,
    PRIMARY KEY(bname, aname)
);


INSERT INTO categories VALUES
    ('Cocktails'), ('Biere'), ('Alkoholfreies');

INSERT INTO additives VALUES
    ('Kaliumphosphat (E 340)'), ('Pektin (E 440)'), ('Citronensäure (E 330)');

INSERT INTO beverages VALUES
    ('Mojito Speciale', 'Cocktail mit Rum, Rohrzucker und Minze', 8, 'Cocktails'),
    ('Franziskaner Weißbier', 'Köstlich mildes Hefeweizen', 6, 'Biere'),
    ('Augustiner Hell', 'Frisch gekühlt vom Fass', 5, 'Biere'),
    ('Coca Cola', 'Coffeeinhaltiges Erfrischungsgetränk', 2.75, 'Alkoholfreies'),
    ('Sprite', 'Erfrischende Zitronenlimonade', 2.50, 'Alkoholfreies'),
    ('Karaffe Wasser', 'Kaltes, gashaltiges Wasser', 6.50, 'Alkoholfreies');

INSERT INTO b_additives_xref VALUES
    ('Coca Cola', 'Kaliumphosphat (E 340)'),
    ('Coca Cola', 'Pektin (E 440)'),
    ('Coca Cola', 'Citronensäure (E 330)');

SqlFiddle

I am trying to list all beverages and their attributes (price, description, etc.) and add another column additives from the b_additives_xref table that holds a concatenated string with all additives contained in each beverage:

SELECT
    beverages.name AS name,
    beverages.description AS description,
    beverages.price AS price,
    beverages.category AS category,
    string_agg(additives.name, ', ') AS additives
FROM beverages, additives
    LEFT JOIN b_additives_xref ON b_additives_xref.aname = additives.name
GROUP BY beverages.name
ORDER BY beverages.category;

The output:

Coca Cola       | Coffeeinhaltiges Erfrischungsgetränk | 2.75 | Alkoholfreies | Kaliumphosphat (E 340), Pektin (E 440), Citronensäure (E 330)
Karaffe Wasser  | Kaltes, gashaltiges Wasser           | 6.50 | Alkoholfreies | Kaliumphosphat (E 340), Pektin (E 440), Citronensäure (E 330)
Sprite          | Erfrischende Zitronenlimonade        | 2.50 | Alkoholfreies | Kaliumphosphat (E 340), Pektin (E 440), Citronensäure (E 330)
Augustiner Hell | Frisch gekühlt vom Fass              | 5.00 | Biere         | Kaliumphosphat (E 340)[...]

Which is wrong since only 'Coca Cola' has existing rows in the b_additives_xref table. Except for the row 'Coca Cola' all other rows should have 'null' or 'empty field' values in the column additives. What am I doing wrong?

Upvotes: 3

Answers (3)

Brad

Reputation: 12253

I believe you are looking for this

SELECT 
    B.name AS name, 
    B.description AS description, 
    B.price AS price,
    B.category AS category, 
    string_agg(A.name, ', ') AS additives 
FROM Beverages B
    LEFT JOIN b_additives_xref xref ON xref.bname = B.name 
    Left join additives A on A.name = xref.aname
GROUP BY B.name
ORDER BY B.category;

Output

NAME    DESCRIPTION                                 PRICE   CATEGORY        ADDITIVES
Coca Cola   Coffeeinhaltiges Erfrischungsgetränk    2.75    Alkoholfreies   Kaliumphosphat (E 340), Pektin (E 440), Citronensäure (E 330)

The problem was that you had a Cartesian product between your beverages and additives tables

FROM beverages, additives

Every record got placed with every other record. They both need to be explicitly joined to the xref table.

Upvotes: 2

Erwin Brandstetter

Reputation: 657982

Some advice on your

Schema

CREATE TABLE category (
   category_id int PRIMARY KEY
  ,category    text UNIQUE NOT NULL
);

CREATE TABLE beverage (
   beverage_id serial PRIMARY KEY
  ,beverage    text UNIQUE NOT NULL  -- maybe not unique?
  ,description text
  ,price       int NOT NULL CHECK (price >= 0)  -- in Cent
  ,category_id int NOT NULL REFERENCES category ON UPDATE CASCADE
                                        -- not: ON DELETE CASCADE 
);

CREATE TABLE additive (
   additive_id serial PRIMARY KEY
  ,additive    text UNIQUE
);

CREATE TABLE bev_add (
    beverage_id int REFERENCES beverage ON DELETE CASCADE ON UPDATE CASCADE
   ,additive_id int REFERENCES additive ON DELETE CASCADE ON UPDATE CASCADE 
   ,PRIMARY KEY(beverage_id, additive_id)
);

Never use "name" as name. It's a terrible, non-descriptive name.
Use small surrogate primary keys, preferably serial columns for big tables or simple integer for small tables. Chances are, the names of beverages and additives are not strictly unique and you want to change them from time to time, which makes them bad candidates for the primary key. integer columns are also smaller and faster to process.
If you only have a handful of categories with no additional attributes, consider an enum instead.
It's good practice to use the same (descriptive) name for foreign key and primary key when they hold the same values.
I never use the plural form as table name unless a single row holds multiple instances of something. Shorter, just a meaningful, leaves the plural for actual plural rows.
Just use text instead of character varying (n).
Think twice before you define a fk constraint to a look-up table with ON DELETE CASCADE
Typically you do not want to delete all beverages automatically if you delete a category (by mistake).
Consider a plain integer column instead of NUMERIC(5, 2) (with the number of Cent instead of € / $). Smaller, faster, simpler. Format on output when needed.

More advice and links in this closely related answer:
How to implement a many-to-many relationship in PostgreSQL?

Query

Adapted to new schema and some general advice.

SELECT b.*, string_agg(a.additive, ', ' ORDER BY a.additive) AS additives
                                     -- order by optional for sorted list
FROM   beverage      b
JOIN   category      c USING (category_id)
LEFT   JOIN bev_add ba USING (beverage_id)  -- simpler now
LEFT   JOIN additive a USING (additive_id)
GROUP  BY b.beverage_id, c.category_id
ORDER  BY c.category;

You don't need a column alias if the column name is the same as the alias.
With the suggested naming convention you can conveniently use USING in joins.
You need to join to category and GROUP BY category_id or category in addition (drawback of suggested schema).
The query will still be faster for big tables, because tables are smaller and indexes are smaller and faster and fewer pages have to be read.

Upvotes: 2

phew

Reputation: 826

The query I was looking for looks like:

SELECT 
    B.name AS name, 
    B.description AS description, 
    B.price AS price,
    B.category AS category, 
    string_agg(A.name, ', ') AS additives 
FROM beverages B
    LEFT JOIN b_additives_xref xref ON xref.bname = B.name 
    LEFT JOIN additives A on A.name = xref.aname
GROUP BY B.name
ORDER BY B.category;

Credits go to Brad for kind of giving me the solution in his answer & comments.

Upvotes: 1

How to JOIN M:M table the right way?

Answers (3)

Schema

Query

Related Questions