Reputation: 813
I have a table with 3 columns which looks like this:
File User Rating (1-5)
------------------------------
00001 1 3
00002 1 4
00003 2 2
00004 3 5
00005 4 3
00005 3 2
00006 2 3
Etc.
I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files):
User Rating Count Percentage
-----------------------------------
1 1 3 .18
1 2 6 .35
1 3 8 .47
2 5 12 .75
2 3 4 .25
With Postgresql, I know how to create a query that includes the first 3 columns using the following query, but I can't figure out how to calculate percentage within the GROUP BY:
SELECT
User,
Rating,
Count(*)
FROM
Results
GROUP BY
User, Rating
ORDER BY
User, Rating
Here I want the percentage calculation to apply to each user/rating group.
Upvotes: 57
Views: 138840
Reputation: 272086
You can use analytic functions on top of aggregate functions:
select
user,
rating,
count(*) as total_per_user_rating_tuple,
sum(count(*)) over (partition by user) as total_per_user,
count(*) / sum(count(*)) over (partition by user) as subtotal_upon_total_per_user
from results
group by user, rating
order by user, rating
The query will work in any RDBMS that implements analytic (window) functions. Notice that aggregate functions are nested inside analytic functions. This works because of the order of operations in SQL.
Upvotes: 0
Reputation: 3806
If data is not too big, you might consider subqueries for ease of use. Make sure subquery queries same data.
SELECT User, [Rating], Count(*) AS COUNT,
100.0* (COUNT(*)/(SELECT COUNT(*) FROM Results) as 'Percentage'
FROM Results
GROUP BY User, [Rating]
I tested this on a similar table
select OrderStatus, Count(*) as Antall,
CASE
WHEN OrderStatus = 0 THEN 'Bestilt (Ordered'
WHEN OrderStatus = 1 THEN 'Besvart (Completed)'
WHEN OrderStatus = 2 THEN 'Utløpt (Expired)'
WHEN OrderStatus = 3 THEN 'Feilet (Error)'
WHEN OrderStatus = 4 THEN 'Ikke bestilt (NotOrdered)'
WHEN OrderStatus = 5 THEN 'Fysisk post returnert uåpnet (PhysicalMailReturned)'
WHEN OrderStatus = 6 THEN 'Prosesserer bestilling (Processing)'
WHEN OrderStatus = 7 THEN 'Deleted (Slettet)'
END AS OrderStatusText,
100 * (COUNT(*) *1.0 / (SELECT COUNT(*) FROM PromsFormOrder where created >= '2023-11-29')) as 'Percentage rate'
FROM PromsFormOrder
where Created >= '2023-11-29'
GROUP BY OrderStatus
The fiddling with multiplying with 1.0 here is to make sure we switch to floating based arithmetic so our percentages are not 0 or 1 due to integer division.
Using partitions over or windowed functions are probably faster than subqueries, but syntax is using more well-known constructs of SQL. The Rating column is a reserved word in T-SQL (SQL Server) so I had to use [Rating], in Postgres this is not required.
Upvotes: 0
Reputation: 19302
WITH t1 AS
(SELECT User, Rating, Count(*) AS n
FROM your_table
GROUP BY User, Rating)
SELECT User, Rating, n,
(0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;
Or
SELECT User, Rating, Count(*) OVER w_user_rating AS n,
(0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);
I would see if one of these or the other yields a better query plan with the appropriate tool for your RDBMS.
Upvotes: 53
Reputation: 74197
Alternatively, you can do the old-school way — arguably easier to grok:
select usr.User as User ,
usr.Rating as Rating ,
usr.N as N ,
(100.0 * usr.N) / total.N as Pct
from ( select User, Rating , count(*) as N
from Results
group by User , Rating
) usr
join ( select User , count(*) as N
from Results
group by User
) total on total.User = usr.User
order by usr.User, usr.Rating
Cheers!
Upvotes: 13
Reputation: 1142
In TSQL this should work
SELECT
User,
Rating,
Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
Results
GROUP BY
User, Rating
ORDER BY
User, Rating
Upvotes: 5
Reputation: 299
WITH data AS
(SELECT User, Rating, Count(*) AS Count
FROM Results
GROUP BY User, Rating)
SELECT User, Rating, Count,
(0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;
Upvotes: 0