Reputation: 35

Randomize Two Data Sets

I am trying to come up with a way to assign two people together from a larger dataset of about 6 people. I was toying around with the random() function in postgres but had no luck. I have access to postgres or oracle whichever may be easier to accomplish this. For example if I had 6 names I'd like to take those 6 names and assign them to one another using some sort of randomizing query:

Billy
Bob
Joe
Sam
John
Alex

The output would be something along the lines of:

Orignal Name | Match

Billy | Alex
Bob | Joe
Joe | John
Sam | Bob
John | Billy
Alex | Sam

Any help would be greatly appreciated!

Thank you.

Upvotes: 1

Answers (1)

JNevill

Reputation: 50200

In postgres you could generate a row_number() on a random number and then join on that. This is nice and fast, but it could cause people to be buddied up with themselves:

SELECT t1.name, t2.name
FROM (SELECT row_number() OVER (ORDER BY random()) as id, name FROM table) t1
INNER JOIN (SELECT row_number() OVER (order by random()) as id, name FROM table) t2 
     ON t1.id = t2.id;

Here is a method using the cartesian product that results from joining the table to itself. This is not a nice solution if the data is huge as there is an intermediate result set that is (N * (N - 1)) rows, but noone will get matched up with themselves:

SELECT name1,
    name2
FROM (
    SELECT t1.NAME name1,
        t2.NAME name2,
        row_number() OVER (PARTITION BY t1.NAME ORDER BY random()) AS rn
    FROM yourtable t1,
        yourtable t2
    WHERE t1.NAME <> t2.NAME
    ) subquery
WHERE rn = 1;

Here is a hybrid of the two. Joining the table to itself on a range of randomly generated ids, also specifying that the names don't match. The intermediate result set will have 1-3 randomly chosen names from t2 for each name in t1. Then we just grab one at random. This has an intermediate result set which will ALWAYS be less than (N*3) records which isn't too bad.

UPDATE: this will, however, match the same person up multiple times... Leaving it here in case it spawns so good idea for that INNER JOIN that prevents that from happening.

WITH randnames AS 
(
    SELECT row_number() OVER (ORDER BY random()) AS id,
        NAME
    FROM yourtable
)
SELECT name1, name2
FROM (
    SELECT t1.NAME name1,
        t2.NAME name2,
        ROW_NUMBER() OVER (PARTITION BY t1.NAME ORDER BY 1) AS rn
    FROM randnames t1
      INNER JOIN randnames t2
        ON t1.NAME <> t2.NAME
            AND t2.id BETWEEN t1.id - 1 AND t1.id + 1
    ) subquery
WHERE rn = 1;

I feel like there is probably some prettier way to do this, but the complete lack of answers on this question an hour after it was asked suggests that it's not an easy problem to solve in SQL.

Upvotes: 1

Randomize Two Data Sets

Answers (1)

Related Questions