Wei Jun
Wei Jun

Reputation: 31

SQL Retrieve data with more than 1 record

I was encountered with a "not a GROUP BY expression"error.

I would need to search for similar title, medium and description. Artist table consist of artistid, artist last name and artist first name. work table consist of workid, title, medium, description, artistid artistid are unique key.

List the details of any works of art that have more than one copy recorded in the database.

SELECT W.workid, W.title, W.medium, W.description, W.artistid, A.FirstName, A.LastName
FROM dtoohey.work W, dtoohey.artist A
GROUP BY W.artistid, A.FirstName, A.LastName 
having count(*) > 1;

Upvotes: 2

Views: 5891

Answers (4)

Thorsten Kettner
Thorsten Kettner

Reputation: 95053

As has been mentioned, the problem is mainly that you don't yet fully understand what you are doing.

First point is your join. By merely separating the tables with commas, you are using a syntax that was made redundant more than twenty years ago. It seems rare that you, as a beginner, use this. You must have found this in a very old book or tutorial. In short: Don't join tables like this. Use explicit joins. A comma means CROSS JOIN. So what you have is:

FROM dtoohey.work W CROSS JOIN dtoohey.artist A

That means that you combine every artist with every work. This is most likely not what you want. You want to join related artists and works. Your query shows there is an artistid in your work table, so one work is made by one artist in your model. The appropriate join would hence be:

FROM dtoohey.work w 
INNER JOIN dtoohey.artist a ON a.artistid = w.artistid

The second point is that you are aggregating rows. GROUP BY W.artistid, A.FirstName, A.LastName tells the DBMS to aggregate the rows such that you get one result row per artist. With having count(*) > 1 you say you only want artists with more than one work. But in your select clause you are showing a work (W.workid, W.title, W.medium, W.description). Which one? If you only show one row per artist and each artist has more than one work, which work of an artist do you show? The DBMS notices that you forgot to tell it what to select and raises an error. And you will probably agree now that the query with the given GROUP BY and HAVING clauses makes no sense.

Upvotes: 0

Dilettant
Dilettant

Reputation: 3335

Maybe writing an answer (that focuses on @mathguy's observation of the missing join specifity and the one, that the mix of SELECT columns versus GROUP BY/HAVING does not fit) is better in finding out what your problem is and giving ideas how to enhance the question ;-) ... next time I suggest to be more heavy on the question, so the world does not have to work so heavy on the answers.

I do not think this is a MySQL, Oracle, or what not database specific problem, but more a SQL beginners learning journey ... you may want to also look for join explanations here: "Difference between Inner Join & Full join"

Starting from the minimal subset of info given in question: 2 tables artist and work related presumable over a shared id (say artist_id).

One thing directly asking for trouble in databases like PostgreSQL or ParStream is selecting columns in group by queries neither being grouped by nor aggregated / filtered. But here we go:

Create tables:

$psql> CREATE TABLE artist(artist_id INT, given_name VARCHAR(42), family_name VARCHAR(99));
CREATE TABLE
$psql> CREATE TABLE work(work_id INT, artist_id INT, title VARCHAR(42));
CREATE TABLE

Insert some data:

$psql> INSERT INTO artist VALUES(1, 'John', 'Doe');
INSERT 0 1
$psql> INSERT INTO artist VALUES(2, 'Natalie', 'Noir');
INSERT 0 1
$psql> INSERT INTO work VALUES(43, 1, 'The game is on');
INSERT 0 1
$psql> INSERT INTO work VALUES(44, 1, 'The game is over');
INSERT 0 1
$psql> INSERT INTO work VALUES(98, 2, 'La nuit commonce');
INSERT 0 1
$psql> INSERT INTO work VALUES(97, 2, 'Un jour se lve');
INSERT 0 1

Check what is in it:

$psql> SELECT * FROM work;
 work_id | artist_id |      title       
---------+-----------+------------------
      43 |         1 | The game is on
      44 |         1 | The game is over
      98 |         2 | La nuit commonce
      97 |         2 | Un jour se lve
(4 rows)

$psql> SELECT * FROM artist;
 artist_id | given_name | family_name 
-----------+------------+-------------
         1 | John       | Doe
         2 | Natalie    | Noir
(2 rows)

Show the implicit INNER JOIN:

$psql> SELECT * FROM work W, artist A;
 work_id | artist_id |      title       | artist_id | given_name | family_name 
---------+-----------+------------------+-----------+------------+-------------
      43 |         1 | The game is on   |         1 | John       | Doe
      43 |         1 | The game is on   |         2 | Natalie    | Noir
      44 |         1 | The game is over |         1 | John       | Doe
      44 |         1 | The game is over |         2 | Natalie    | Noir
      98 |         2 | La nuit commonce |         1 | John       | Doe
      98 |         2 | La nuit commonce |         2 | Natalie    | Noir
      97 |         2 | Un jour se lve   |         1 | John       | Doe
      97 |         2 | Un jour se lve   |         2 | Natalie    | Noir
(8 rows)

Show the explicit INNER JOIN with a dummy condition to let the parser pass our query (Update: Don't use this at home, only to show the mish-mash.):

$psql> SELECT * FROM work W INNER JOIN artist A ON 1 = 1;
 work_id | artist_id |      title       | artist_id | given_name | family_name 
---------+-----------+------------------+-----------+------------+-------------
      43 |         1 | The game is on   |         1 | John       | Doe
      43 |         1 | The game is on   |         2 | Natalie    | Noir
      44 |         1 | The game is over |         1 | John       | Doe
      44 |         1 | The game is over |         2 | Natalie    | Noir
      98 |         2 | La nuit commonce |         1 | John       | Doe
      98 |         2 | La nuit commonce |         2 | Natalie    | Noir
      97 |         2 | Un jour se lve   |         1 | John       | Doe
      97 |         2 | Un jour se lve   |         2 | Natalie    | Noir
(8 rows)

Now a more useful INNER JOIN matching only these entries from the two tables, that are related through "creator" relationship:

$psql> SELECT * FROM work W INNER JOIN artist A ON W.artist_id = A.artist_id;
 work_id | artist_id |      title       | artist_id | given_name | family_name 
---------+-----------+------------------+-----------+------------+-------------
      43 |         1 | The game is on   |         1 | John       | Doe
      44 |         1 | The game is over |         1 | John       | Doe
      98 |         2 | La nuit commonce |         2 | Natalie    | Noir
      97 |         2 | Un jour se lve   |         2 | Natalie    | Noir
(4 rows)

So above we blindly trust the data managing part to magically enter artist_id values always correctly and matching our expectations (in real life a REFERENCES foreign key constraint would surely placed on the column in the work table (no work without an artist/creator would dictate the artist table to be the "first" causally).

Above you also see that selecting from a list of tables is identical to an INNER JOIN without any constraint i.e. the cartesian product of all entries from table work with all all entries from table artist.

Now your query (edited a tad for the minimal table model) besides not clear to me in its request idea, errors out as explained on top of this answer text:

$psql>  SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.family_name FROM work W, artist A GROUP BY W.artist_id, A.given_name, A.family_name HAVING COUNT(*) > 1;

ERROR:  column "w.work_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.fam...

This is of course not cured by using a more meaningful joined input set (claro, as the error points to the mismatch in select and group by lists:

$psql> SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.family_name FROM work W INNER JOIN artist A ON W.artist_id = A.artist_id GROUP BY W.artist_id, A.given_name, A.family_name HAVING COUNT(*) > 1;

ERROR:  column "w.work_id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.fam...

You need suggest input on what you want to achieve to obtain a single answer (working). Until you do so, here are some offerings:

Given that you join on only existing pairs of artis and work ids, you do not need that having clause, as neither non-existing artist nor missing works nor missing combinations of artists and works will ever enter the rowset your query works on, so:

$psql> SELECT title, R.* FROM ( SELECT W.work_id AS work_id_filtered, W.artist_id, A.given_name, A.family_name FROM work W INNER JOIN artist A ON W.artist_id = A.artist_id GROUP BY W.work_id, W.artist_id, A.given_name, A.family_name) R INNER JOIN work WW ON WW.work_id = R.work_id_filtered;
      title       | work_id_filtered | artist_id | given_name | family_name 
------------------+------------------+-----------+------------+-------------
 The game is on   |               43 |         1 | John       | Doe
 The game is over |               44 |         1 | John       | Doe
 La nuit commonce |               98 |         2 | Natalie    | Noir
 Un jour se lve   |               97 |         2 | Natalie    | Noir
(4 rows)

This should give you a bit clumsy but good enough for my sunday morning the nice list of all titles (non-grouped files) combined with the grouped fields from the inner query. Formatted query might be written as:

SELECT title,
       R.*
FROM
  (SELECT W.work_id AS work_id_filtered,
                       W.artist_id,
                       A.given_name,
                       A.family_name
   FROM
   work W
   INNER JOIN artist A ON W.artist_id = A.artist_id
   GROUP BY W.work_id,
            W.artist_id,
            A.given_name,
            A.family_name) R
INNER JOIN
work WW ON WW.work_id = R.work_id_filtered;

Removing any GROUP BY (until the question offers detail on why it would be needed for the task):

$psql> SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.family_name FROM work W INNER JOIN artist A ON W.artist_id = A.artist_id;

 work_id |      title       | artist_id | given_name | family_name 
---------+------------------+-----------+------------+-------------
      43 | The game is on   |         1 | John       | Doe
      44 | The game is over |         1 | John       | Doe
      98 | La nuit commonce |         2 | Natalie    | Noir
      97 | Un jour se lve   |         2 | Natalie    | Noir
(4 rows)

Query formatted to not have to scroll horizontally:

SELECT W.work_id, W.title,  W.artist_id, A.given_name, A.family_name 
FROM work W INNER JOIN artist A ON W.artist_id = A.artist_id;

Note: Yes, as @ThorstenKettner pointed rightfully out, I made up the term "FULL INNER JOIN", I smoetimes do, sorry. Maybe my brain needs cartesian filling to balance the LEFT|RIGHT|FULL OUTER JOINs - who knows ;-)

Upvotes: 1

Wei Jun
Wei Jun

Reputation: 31

I used scaisEdge answer and edited. What I realize was if I select workid(unique key), I won't be able to display anything but without it, all my field is up.

Upvotes: 0

ScaisEdge
ScaisEdge

Reputation: 133380

Seems you don't have a proper join between the tables (i have suggest one .. you should do the correct one)

If you want the group by on W.artistIid (alias the count of the workid for each artist) you cannot have W.workid, W.title, W.medium in select

SELECT  W.artistid, A.FirstName, A.LastName
FROM dtoohey.work W
INNER JOIN dtoohey.artist A ON A.artistid = W.artistid
GROUP BY W.artistid, A.FirstName, A.LastName 
having count(*) > 1;

otherwise if you want check if the select return more that one rows for the column select you must add all column to the group by clause

SELECT W.workid, W.title, W.medium, W.description, W.artistid, A.FirstName, A.LastName
FROM dtoohey.work W
INNER JOIN dtoohey.artist A ON A.artistid = W.artistid
GROUP BY W.workid, W.title, W.medium, W.description, W.artistid, A.FirstName, A.LastName 
having count(*) > 1;

Upvotes: 1

Related Questions