How to select pairs of rows from a table in sqlite?

Question

I have the following schema:

    CREATE TABLE video_segments (
        video_id TEXT,
        segment_num INTEGER,
        data BLOB
    )

I want to select 100 pairs of rows from the table, such that each pair contains 2 segments from the same video, and no 2 pairs are from the same video.

What would be the best query for this?

I tried doing:

WITH CTE AS (
  SELECT video_id
  FROM video_segments
  GROUP BY video_id
  HAVING COUNT(*) >= 2
)
SELECT *
FROM video_segments
WHERE video_id IN (
  SELECT video_id
  FROM CTE
  ORDER BY RANDOM()
  LIMIT 100
)
ORDER BY RANDOM()
LIMIT 200;

but this did not work because sometimes there would be 0 rows with a given video id, and sometimes there would be more than 2 rows with a video id.

Sampled data:

video_id|segment_num|data|
foo      0            
foo      1            
foo      2            
foo      3            
bar      0            
bar      1            
baz      0            
baz      1            
baz      2

Lets say I wanted to select 3 random pairs. A valid result might be:

foo 0 
foo 2 
bar 0 
bar 1 
baz 0 
baz 2

since results should be random. The result should be different each time.

Jon Armstrong · Accepted Answer

Something like this should do:

The fiddle

Assume vid <= 10 are the videos chosen and
We wish to select 2 sid's per selected vid.
Randomly assign a row number to each sid within each vid separately.
Then pick the first 2 row numbers (ord < 3) for each of those vids

Note: The first recursive CTE term is only to create test data for segments.

WITH segments (sid, vid) AS (
        SELECT 0  , 1 UNION ALL
        SELECT sid+1, 1 + (sid+1)/10 FROM segments WHERE sid < 100
     )
   , cte1 AS (
        SELECT t.*, row_number() OVER (PARTITION BY vid ORDER BY random()) AS ord
          FROM segments AS t
         WHERE vid <= 10
     )
SELECT * FROM cte1
 WHERE ord < 3
 ORDER BY vid, ord
;

Now applying that to your schema and initial logic, we have something like this:

The updated fiddle

WITH cte AS (
      SELECT video_id
        FROM video_segments
       GROUP BY video_id
      HAVING COUNT(*) >= 2
     )
   , cte1 AS (
      SELECT *, row_number() OVER (PARTITION BY video_id ORDER BY random()) AS ord
        FROM video_segments
       WHERE video_id IN (
                SELECT video_id
                  FROM cte
                 ORDER BY RANDOM()
                 LIMIT 100
             )
     )
SELECT * FROM cte1
 WHERE ord < 3
 ORDER BY video_id, ord
;

Fiddle updated with the new data

Result 1:

video_id	segment_num	data	ord
bar	1	null	1
bar	0	null	2
baz	0	null	1
baz	2	null	2
foo	1	null	1
foo	0	null	2

Result 2:

video_id	segment_num	data	ord
bar	1	null	1
bar	0	null	2
baz	1	null	1
baz	0	null	2
foo	2	null	1
foo	3	null	2

etc.

The larger set of data generated in the first fiddle test case shows the random behavior a bit better.

How to select pairs of rows from a table in sqlite?

Answers (2)

Related Questions