Reputation: 79
Context: I have 4 different song charts which show the current trending songs on their respective platforms. I want to aggregate the charts to produce a single chart with a "combined" ranking but I want the weights I give a rank from a particular chart to be configurable. So, I may have a unique song (uniqueness denoted by ISRC) that shows up on all 4 charts. I want to merge all 4 charts first, the aggregate the ranking.
Question: Currently, I've merged two tables but am unsure how I can merge two rows with the same ISRC into a single row while adding their ranks up? My code so far:
SELECT * FROM ((SELECT rank,
isrc,
song_name,
dataset_datetime,
'applemusic' AS SOURCE
FROM "myTable"
WHERE chart_country ='US')
UNION
(SELECT rank,
isrc,
song_name,
dataset_datetime,
'spotify' AS SOURCE
FROM "myTable2"
WHERE chart_country ='US'))
ORDER BY cast(rank as int);
Please let me know if any further clarification is required
Upvotes: 0
Views: 707
Reputation: 35563
You can use a group by
clause to reduce the rows and perform aggregation(s) of rank.
SELECT isrc
, song_name
, count(*) as num_charts
, sum(cast rank AS INT) as sum_rank
, avg(cast rank AS decimal(10, 2)) as avg_rank
FROM (
(
SELECT rank
, isrc
, song_name
FROM `myTable`
WHERE chart_country = 'US'
)
UNION ALL ## use "union all" if summing or counting the resultset
(
SELECT rank
, isrc
, song_name
FROM `myTable2`
WHERE chart_country = 'US'
)
)
GROUP BY
isrc
, song_name
ORDER BY
sum_rank DESC
, avg_rank DESC
note: I'm not sure why you order by casting rank to an integer, if that column is not defined as numeric then you open the possibility of conversion error - but I have included casts "just in case".
Please note that "union", when used by itself removes duplicate rows, hence it has the potential to upset accurate sums, counts or averages etc. So instead use "union all" which does not attempt to remove duplicate rows (and can be a bit faster because of this).
I have removed 2 columns as they won't summarise well although you could use GROUP_CONCAT(SOURCE)
(as noted by PM-77-1) and you could use MIN() or MAX() for the date column if that would be of use.
Upvotes: 1