R S
R S

Reputation: 11879

SQL random aggregate

Say I have a simple table with 3 fields: 'place', 'user' and 'bytes'. Let's say, that under some filter, I want to group by 'place', and for each 'place', to sum all the bytes for that place, and randomly select a user for that place (uniformly from all the users that fit the 'where' filter and the relevant 'place'). If there was a "select randomly from" aggregate function, I would do:

SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;

...but I couldn't find such an aggregate function. Am I missing something? What could be a good way to achieve this?

Upvotes: 11

Views: 5590

Answers (4)

jgomo3
jgomo3

Reputation: 1223

With a custom aggregate function, you could write expressions as simple as:

SELECT place, SUM(bytes), SELECT_AT_RANDOM(user) WHERE .... GROUP BY place;

SELECT_AT_RAMDOM would be the custom aggregate function.

Here is precisely an implementation in PostgreSQL.

Upvotes: 1

fthiella
fthiella

Reputation: 49089

I think your question is DBMS specific. If your DBMS is MySql, you can use a solution like this:

SELECT place_rand.place, SUM(place_rand.bytes), place_rand.user as random_user
FROM
  (SELECT place, bytes, user
   FROM place
   WHERE ...
   ORDER BY rand()) place_rand
GROUP BY
  place_rand.place;

The subquery orders records in random order. The outer query groups by place, sums bytes, and returns first random user, since user is not in an aggregate function and neither in the group by clause.

Upvotes: 3

Gordon Linoff
Gordon Linoff

Reputation: 1270593

I would do a bit of a variation on Martin's solution:

select place, sum(bytes), max(case when seqnum = 1 then user end) as random_user
from (select place, bytes,
             row_number() over (partition by place order by newid()) as sequm
       from t
      ) t
group by place

(Where newid() is just one way to get a random number, depending on the database.)

For some reason, I prefer this approach, because it still has the aggregation function in the outer query. If you are summarizing a bunch of fields, then this seems cleaner to me.

Upvotes: 0

Martin Smith
Martin Smith

Reputation: 453658

If your RDBMS supports analytical functions.

WITH T
     AS (SELECT place,
                Sum(bytes) OVER (PARTITION BY place) AS Sum_bytes,
                user,
                Row_number() OVER (PARTITION BY place ORDER BY random_function()) AS RN
         FROM   YourTable
         WHERE  .... )
SELECT place,
       Sum_bytes,
       user
FROM   T
WHERE  RN = 1; 

For SQL Server Crypt_gen_random(4) or NEWID() would be examples of something that could be substituted in for random_function()

Upvotes: 5

Related Questions