Reputation: 75
Here's the relational model for the database that I'm going to talk about (a show can have multiple actors and multiple genres): Relational model (sorry, don't have enough reputation to insert an image)
What I would like to achieve is to get every detail about a specific show + the genres that it has + the actors that played in the show and their IDs.
I tried:
SELECT shows.*,
STRING_AGG(g.name, ', ') AS genres,
STRING_AGG(CAST(a.id AS VARCHAR), ', ') AS actor_ids,
STRING_AGG(a.name, ', ') AS actors
FROM shows
LEFT JOIN show_genres sg ON shows.id = sg.show_id
LEFT JOIN genres g ON g.id = sg.genre_id
LEFT JOIN show_characters sc ON shows.id = sc.show_id
LEFT JOIN actors a ON sc.actor_id = a.id
WHERE shows.id = 1390
GROUP BY shows.id
The result:
genres | actor_ids | actors |
---|---|---|
Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Action, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Adventure, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Drama, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy, Fantasy | 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041, 436511, 436512, 412933, 9739, 414839, 436539, 436541, 11439, 419217, 415442, 413734, 432873, 420752, 421890, 436592, 448107, 458178, 150561, 436590, 412041 | Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann, Emilia Clarke, Kit Harington, Peter Dinklage, Lena Headey, Nikolaj Coster-Waldau, Sophie Turner, Maisie Williams, Iain Glen, Alfie Allen, Liam Cunningham, John Bradley, Conleth Hill, Aidan Gillen, Gwendoline Christie, Isaac Hempstead-Wright, Kristofer Hivju, Nathalie Emmanuel, Jacob Anderson, Jerome Flynn, Rory McCann |
Note: only showing the problematic columns
The problem: After joining the "show_characters" and the "actors" table, there are a lot of duplicate data in the string aggregations.
Question: How would I solve this? Without having duplicate data.
My workaround:
@connection_handler
def get_show_by_id(cursor: 'RealDictCursor', id: int) -> 'RealDictRow':
"""
Args:
cursor: a cursor which returns dictionaries (use @connection.connection_handler decorator)
id: number of items shows on a single page
Returns:
All show details in a RealDictRow + genres(as a concatenated string) + actors in a list
containing the actors ids and names in a RealDictRow
"""
query = """
SELECT shows.*,STRING_AGG(g.name,',') AS genres
FROM shows
LEFT JOIN show_genres sg ON shows.id = sg.show_id
LEFT JOIN genres g ON sg.genre_id = g.id
WHERE shows.id = %s
GROUP BY shows.id
"""
val = (id,)
cursor.execute(query, val)
shows = cursor.fetchone()
query = """
SELECT a.id,a.name
FROM actors a
RIGHT JOIN show_characters sc ON a.id = sc.actor_id
RIGHT JOIN shows s ON sc.show_id = s.id
WHERE s.id = %s
GROUP BY a.id
"""
val = (id,)
cursor.execute(query, val)
actors = cursor.fetchall()
shows["actors"] = actors
return shows
Note for workaround: Used two separate queries to achieve the result I wanted, but I'd like to know how would I be able to do this with one SQL query.
Side note: I'd be really happy to receive any feedback about my code in general, or suggestion about how people usually handle when they get back multiple data from an SQL query, do they just split it? Or is it better to write a separate query, like I did, so I get the data in a separate list right away? What about my docstring? I just started trying to use them, not sure if the way I'm doing it is good or not.
Upvotes: 0
Views: 981
Reputation: 75
Meanwhile I've been able to come up with a solution myself, just to have an easier time managing the data returned later, I convert the returned array of arrays into a list of dictionaries.
@connection_handler
def get_show_by_id(cursor: 'RealDictCursor', id: int) -> 'RealDictRow':
"""
Args:
cursor: a cursor which returns dictionaries (use @connection.connection_handler decorator)
id: number of items shows on a single page
Returns:
All show details in a RealDictRow + genres(as a concatenated string) + actors in a list
containing the actors ids and names in a dictionary
"""
# ONE QUERY SOLUTION:
# concatenating actors names and ids together to prevent issue when actors share the same name
# and worked on the same show
query = """
SELECT
shows.*,
STRING_AGG(DISTINCT g.name, ',') AS genres,
ARRAY_AGG(DISTINCT ARRAY[a.id::VARCHAR, a.name]) AS actors_ids_with_names
FROM shows
LEFT JOIN show_genres sg ON shows.id = sg.show_id
LEFT JOIN genres g ON g.id = sg.genre_id
LEFT JOIN show_characters sc ON shows.id = sc.show_id
LEFT JOIN actors a ON sc.actor_id = a.id
WHERE shows.id=%s
GROUP BY shows.id
"""
val = (id,)
cursor.execute(query, val)
show = cursor.fetchone()
actor = [{"id":actor_id_with_name[0], "name": actor_id_with_name[1] } for actor_id_with_name in show["actors_ids_with_names"]]
show.pop("actors_ids_with_names")
show["actors"] = actor
return show
Upvotes: 0
Reputation: 525
One approach in this sort of situation is to STRING_AGG each of the constituent tables (or sets of tables) first; then, LEFT JOIN those contrived tables onto the main table. This sidesteps the problem of multiplication that can occur during consecutive LEFT JOINs.
In your case, try something like this:
SELECT
shows.*,
show_genre_names.genre_names,
show_actors.actor_ids,
show_actors.actor_names
FROM
shows
LEFT JOIN
( -- one row per show_id
SELECT
sg.show_id,
STRING_AGG(g.name, ', ') AS genre_names
FROM
show_genres sg
JOIN genres g ON g.id = sg.genre_id
GROUP BY
sg.show_id
) show_genre_names
ON shows.id = show_genre_names.show_id
LEFT JOIN
( -- one row per show_id
SELECT
sc.show_id,
STRING_AGG(a.id, ', ') AS actor_ids,
STRING_AGG(a.name, ', ') AS actor_names
FROM
show_characters sc
JOIN actors a ON a.id = sc.actor_id
GROUP BY
sc.show_id
) show_actors
ON shows.id = show_actors.show_id
WHERE
shows.id = 1390
;
You can solve this in other ways, too, but understanding this technique will be helpful in your SQL journey.
Upvotes: 1