Lizardie
Lizardie

Reputation: 210

how to join bigger table T1 to a smaller T2 with random repeating rows from T2

sorry if this in some way has been asked already. i've searched and couldn't find this specific solution. so i would appreciate an answer or pointer to the right place... i have two tables of different (varying) length. Toy example:

T1:

SELECT * FROM T1;
 gid | call_s1 
-----+---------
   1 |       1
   3 |       1
   4 |       1
   7 |       1
   8 |       1
(5 rows)

and

   SELECT * FROM T2;
     gid |       dt_ping       
    -----+---------------------
       1 | 2009-06-06 19:00:00
       2 | 2009-06-06 19:00:15
       3 | 2009-06-06 19:00:30
       4 | 2009-06-06 19:00:45
    (4 rows)

I would like to get a result T3 that has assigned random rows from T2.dt_ping to each row in T1 repeating if necessary. For instance possible result would be:

 gid | call_s1  | dt_ping
-----+----------+---------
   1 |       1  | 2009-06-06 19:00:45
   3 |       1  | 2009-06-06 19:00:30
   4 |       1  | 2009-06-06 19:00:15
   7 |       1  | 2009-06-06 19:00:00
   8 |       1  | 2009-06-06 19:00:45

I have tried offset, order by random, etc. Either I get a cartesian product or nulls. For instance this is my last attempt:

SELECT
  T1.gid
  , T1.call_s1
  , T2.dt_ping
FROM
  (
    SELECT
      gid
      , call_s1
      , ceiling(  random() * (SELECT count(*)::int as n FROM fake_called_small )   )  tgid
    FROM fake_called_small
  ) T1
  LEFT OUTER JOIN
  fake_times_small T2
    ON
      T2.gid = T1.tgid;

and one of the results i get:

 gid | call_s1 |       dt_ping       
-----+---------+---------------------
   1 |       1 | 
   3 |       1 | 2009-06-06 19:00:30
   4 |       1 | 2009-06-06 19:00:45
   7 |       1 | 2009-06-06 19:00:15
   8 |       1 | 
(5 rows)

I know I'm missing something simple, but what?

btw i tried this:

SELECT
  T1.gid
  , T1.call_s1
  , (SELECT dt_ping FROM fake_times_small T2 ORDER BY random() LIMIT 1)
FROM fake_called_small  T1;

and got:

 gid | call_s1 |       dt_ping       
-----+---------+---------------------
   1 |       1 | 2009-06-06 19:00:30
   3 |       1 | 2009-06-06 19:00:30
   4 |       1 | 2009-06-06 19:00:30
   7 |       1 | 2009-06-06 19:00:30
   8 |       1 | 2009-06-06 19:00:30
(5 rows)

same row for dt_ping repeated because the subselect is done only once

Upvotes: 0

Views: 41

Answers (1)

Fabian Pijcke
Fabian Pijcke

Reputation: 3210

If you are not concerned with performance, the following query should do what you want:

SELECT gid, call_s1, dt_ping
FROM (
  SELECT t1.gid, call_s1, dt_ping,
    ROW_NUMBER() OVER (PARTITION BY t1.gid, call_s1 ORDER BY RANDOM()) AS rn
  FROM t1
  CROSS JOIN t2
) x
WHERE rn = 1;

Upvotes: 2

Related Questions