DeadMonkey
DeadMonkey

Reputation: 813

Calculating percentages with GROUP BY query

I have a table with 3 columns which looks like this:

File    User     Rating (1-5)
------------------------------
00001    1        3
00002    1        4
00003    2        2
00004    3        5
00005    4        3
00005    3        2
00006    2        3
Etc.

I want to generate a query that outputs the following (for each user and rating, display the number of files as well as percentage of files):

User    Rating   Count   Percentage
-----------------------------------
1       1         3      .18
1       2         6      .35
1       3         8      .47
2       5         12     .75
2       3         4      .25

With Postgresql, I know how to create a query that includes the first 3 columns using the following query, but I can't figure out how to calculate percentage within the GROUP BY:

SELECT
    User,
    Rating,
    Count(*)
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

Here I want the percentage calculation to apply to each user/rating group.

Upvotes: 57

Views: 138840

Answers (7)

Salman Arshad
Salman Arshad

Reputation: 272086

You can use analytic functions on top of aggregate functions:

select
    user,
    rating,
    count(*) as total_per_user_rating_tuple,
    sum(count(*)) over (partition by user) as total_per_user,
    count(*) / sum(count(*)) over (partition by user) as subtotal_upon_total_per_user
from results
group by user, rating
order by user, rating

The query will work in any RDBMS that implements analytic (window) functions. Notice that aggregate functions are nested inside analytic functions. This works because of the order of operations in SQL.

Demo on DB<>Fiddle

Upvotes: 0

Tore Aurstad
Tore Aurstad

Reputation: 3806

If data is not too big, you might consider subqueries for ease of use. Make sure subquery queries same data.

SELECT User, [Rating], Count(*) AS COUNT,
100.0* (COUNT(*)/(SELECT COUNT(*) FROM Results) as 'Percentage'
FROM Results
GROUP BY User, [Rating]

I tested this on a similar table

select OrderStatus, Count(*) as Antall,
CASE 
    WHEN OrderStatus = 0 THEN 'Bestilt (Ordered'
    WHEN OrderStatus = 1 THEN 'Besvart (Completed)'
    WHEN OrderStatus = 2 THEN 'Utløpt (Expired)'
    WHEN OrderStatus = 3 THEN 'Feilet (Error)'
    WHEN OrderStatus = 4 THEN 'Ikke bestilt (NotOrdered)'
    WHEN OrderStatus = 5 THEN 'Fysisk post returnert uåpnet (PhysicalMailReturned)'
    WHEN OrderStatus = 6 THEN 'Prosesserer bestilling (Processing)'
    WHEN OrderStatus = 7 THEN 'Deleted (Slettet)'
END AS OrderStatusText,
100 * (COUNT(*) *1.0 / (SELECT COUNT(*) FROM PromsFormOrder where created >= '2023-11-29')) as 'Percentage rate'
FROM PromsFormOrder
where Created >= '2023-11-29'
GROUP BY OrderStatus

The fiddling with multiplying with 1.0 here is to make sure we switch to floating based arithmetic so our percentages are not 0 or 1 due to integer division.

Grouped data - calculate percentage

Using partitions over or windowed functions are probably faster than subqueries, but syntax is using more well-known constructs of SQL. The Rating column is a reserved word in T-SQL (SQL Server) so I had to use [Rating], in Postgres this is not required.

Upvotes: 0

Andrew Lazarus
Andrew Lazarus

Reputation: 19302

WITH t1 AS 
 (SELECT User, Rating, Count(*) AS n 
  FROM your_table
  GROUP BY User, Rating)
SELECT User, Rating, n, 
       (0.0+n)/(COUNT(*) OVER (PARTITION BY User)) -- no integer divide!
FROM t1;

Or

SELECT User, Rating, Count(*) OVER w_user_rating AS n, 
        (0.0+Count(*) OVER w_user_rating)/(Count(*) OVER (PARTITION BY User)) AS pct
FROM your_table
WINDOW w_user_rating AS (PARTITION BY User, Rating);

I would see if one of these or the other yields a better query plan with the appropriate tool for your RDBMS.

Upvotes: 53

Nicholas Carey
Nicholas Carey

Reputation: 74197

Alternatively, you can do the old-school way — arguably easier to grok:

select usr.User                   as User   ,
       usr.Rating                 as Rating ,
       usr.N                      as N      ,
       (100.0 * usr.N) / total.N as Pct
from ( select User, Rating , count(*) as N
       from Results
       group by User , Rating
     ) usr
join ( select User , count(*) as N
       from Results
       group by User
     ) total on total.User = usr.User
order by usr.User, usr.Rating

Cheers!

Upvotes: 13

James Holland
James Holland

Reputation: 1142

In TSQL this should work

SELECT
    User,
    Rating,
    Count(*), SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating) AS Total,
Count(*)/(SUM(COUNT(*)) OVER (PARTITION BY User, Rating ORDER BY User, Rating)) AS Percentage
FROM
    Results
GROUP BY
    User, Rating
ORDER BY
    User, Rating

Upvotes: 5

user3060544
user3060544

Reputation: 299

WITH data AS 
 (SELECT User, Rating, Count(*) AS Count 
  FROM Results
  GROUP BY User, Rating)
SELECT User, Rating, Count, 
       (0.0+n)/(SUM(Count) OVER (PARTITION BY User))
FROM data;

Upvotes: 0

mikerobi
mikerobi

Reputation: 20878

The best way to do this would be with window functions.

Upvotes: 3

Related Questions