Polyphase29
Polyphase29

Reputation: 503

Oracle SQL splitting single rows into multiples by delimiter

I'm working on a JDBC/Oracle SQL project and have ran into what I think is my final issue. I have a large table of movies and their categories, listed as follows:

Table name = Categories
MovieId          MovieTitle            Category
1                Die Hard              Action|Drama
2                GoodFellas            Drama|Crime

I'm trying to make it so that it will look like this:

MovieId          MovieTitle            Category
1                Die Hard              Action
1                Die Hard              Drama
2                Goodfellas            Drama
2                Goodfellas            Crime

Here's the general idea I've been playing around with. When I run it on my large table it seems to endlessly run, I tested it on a smaller table and it did what I wanted, however it only displayed the result for Goodfellas and not Die Hard. Keep in mind I need it to update my original table, not just create a query.

SELECT distinct MovieId, MovieTitle,REGEXP_SUBSTR(Category,'[^|$]+', 1, LEVEL) Category

FROM Categories

CONNECT BY REGEXP_SUBSTR(Category, '[^|$]+', 1, LEVEL) IS NOT NULL;

Any help is appreciated. I've looked through other threads on SO and a lot of them deal with sorting through smaller tables of data, not larger ones like mine. Thanks!

Upvotes: 0

Views: 60

Answers (2)

Gary_W
Gary_W

Reputation: 10360

Here's another way. Instead of updating, why not just truncate the table and reload it?

SQL> with Movie(id,title,genre) as (
     select 1, 'Die Hard',   'Action|Drama' from dual union
     select 2, 'GoodFellas', 'Drama|Crime'  from dual
   )
   SELECT Id, title, --column_value substring_nbr,
          regexp_substr(genre, '(.*?)(\||$)', 1, column_value, null, 1) genre
   FROM category,
        TABLE(
          CAST(
            MULTISET(SELECT LEVEL
                     FROM dual
                     CONNECT BY LEVEL <= REGEXP_COUNT(genre, '\|')+1
                     ) AS sys.OdciNumberList
                   )
               )
   ORDER BY Id, title --, substring_nbr
   ;

        ID TITLE      GENRE
---------- ---------- ------------
         1 Die Hard   Action
         1 Die Hard   Drama
         2 GoodFellas Drama
         2 GoodFellas Crime

SQL>

While this answers your question, its not a good database design. Let's take a step back. You have an entity of a Movie with attributes of ID and Title, and an entity of Genre with attributes that describe the entity of ID and description. A movie can have many genres and a genre can used by many movies so that is a many to many relationship. To model that, you take the primary key (the attribute that uniquely identifies an instance of an entity) from each entity and together as foreign keys they make a primary key of what's called an associative table. In this case I'd call it movie_genre. So for your test data I would model this with 3 tables:

Movie                  Genre              Movie_Genre
-----                  -----              -----------
movieID  TITLE         genreID  DESC      movieID  genreID
1        Die Hard      1        Action    1        1
2        GoodFellas    2        Drama     1        2
                       3        Crime     2        2
                                          2        2

Upvotes: 1

Trung Duong
Trung Duong

Reputation: 3475

You could try recursive CTE without regex

WITH result_data (MovieId, MovieTitle, Category, StartPosition, EndPosition) 
     AS (SELECT MovieId, 
                MovieTitle, 
                Category, 
                1, 
                INSTR(Category, '|') 
         FROM   Categories 
         UNION ALL 
         SELECT MovieId, 
                MovieTitle, 
                Category, 
                EndPosition + 1, 
                INSTR(Category, '|', EndPosition + 1) 
         FROM   result_data 
         WHERE  EndPosition > 0) 
SELECT MovieId, 
       MovieTitle, 
       SUBSTR(Category, StartPosition, 
              DECODE(EndPosition, 0, LENGTH(Category) + 1, EndPosition) - StartPosition) AS Category 
FROM   result_data 
ORDER  BY MovieId, StartPosition 

You could check a demo here

Upvotes: 2

Related Questions