Konstantin
Konstantin

Reputation: 3159

SQL query to split and keep only the top N values

I have the following table data:

| name  |items     |
--------------------
| Bob   |1, 2, 3   |
| Rick  |5, 3, 8, 4|
| Bill  |2, 4      |

I need to create a table with a split items column, but with the limitation to have at most N items per name. E.g. for N = 3 the table should look like this:

|name |item|
-----------
|Bob  |1   |
|Bob  |2   |
|Bob  |3   |
|Rick |5   | 
|Rick |3   | 
|Rick |8   |
|Bill |2   |
|Bill |4   |


I have the following query that splits items correctly, but doesn't account for the maximum number N. What should I modify in the query (standard SQL, BigQuery) to account for N?

WITH data_split AS (
    SELECT name, SPLIT(items,',') AS item
    FROM (
        SELECT name, items
        -- A lot of additional logic here
        FROM data
    )
)

SELECT name, item
FROM data_split
CROSS JOIN UNNEST(data_split.item) AS item

Upvotes: 1

Views: 205

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Consider below approach (BigQuery)

select name, trim(item) item
from your_table, unnest(split(items)) item with offset 
where offset < 3       

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

marcothesane
marcothesane

Reputation: 6741

You can try a more semi-standard way - works practically everywhere:

WITH
-- your input ...                                                                                                                                                                                   
indata(id,nam,items) AS (             -- need a sorting column "id" to keep the sort order
          SELECT 1, 'Bob' ,'1,2,3'    -- blanks after comma can irritate 
UNION ALL SELECT 2, 'Rick','5,3,8,4'  -- the splitting function below ...
UNION ALL SELECT 3, 'Bill','2,4'
)
-- real query starts here, replace comma below with "WITH" ...
,
-- exactly 3 integers
i(i) AS (
          SELECT 1   -- need to add FROM DUAL , in Oracle, for example ...
UNION ALL SELECT 2
UNION ALL SELECT 3
)
SELECT
  id
, nam
, SPLIT(items,',',i) AS item -- SPLIT_PART in other DBMS-s
FROM indata CROSS JOIN i 
WHERE SPLIT_PART(items,',',i) <> ''
ORDER BY 1, 3
;
-- out  id | nam  | item 
-- out ----+------+------
-- out   1 | Bob  | 1
-- out   1 | Bob  | 2
-- out   1 | Bob  | 3
-- out   2 | Rick | 3
-- out   2 | Rick | 5
-- out   2 | Rick | 8
-- out   3 | Bill | 2
-- out   3 | Bill | 4

Upvotes: 2

Related Questions