Reputation: 159
I have a table structure that contains data in column in comma separated values. I want to split these values into rows.
The data in table is something like this
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN |ATTACHED_SUPPLIER_DOCUMENT |ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|--------------------|---------------------------|------------------------|
001059-0000 |CHENGDA |268452,268453 |268456,268457,268459,268460|268465 |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458,268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451,268452,268453|268456,268457,268459,268460|268465,268466 |
I have tried this query but it is not behaving well not sure the reason
WITH CTE AS(
SELECT DISTINCT
MP.PART_NUMBER MANUFACTURER_PART_NUMBER, M.NAME MANUFACTURER_NAME,
RTRIM(LTRIM(PG3.MULTILIST31, ','), ',') TEST_PLAN,
RTRIM(LTRIM(PG3.MULTILIST32, ','), ',') ATTACHED_SUPPLIER_DOCUMENT,
RTRIM(LTRIM(PG3.MULTILIST33, ','), ',') ATTACHED_LOGITECH_REPORT
FROM MANU_PARTS MP
INNER JOIN MANUFACTURERS M ON M.ID = MP.MANU_ID
LEFT JOIN PAGE_TWO PG2 ON PG2.ID = MP.ID
LEFT JOIN PAGE_THREE PG3 ON PG3.ID = MP.ID
WHERE PART_NUMBER = '001059-0000'
)
SELECT DISTINCT
MANUFACTURER_PART_NUMBER, MANUFACTURER_NAME,
REGEXP_SUBSTR(TEST_PLAN, '[^,]+', 1, LEVEL) TEST_PLAN,
REGEXP_SUBSTR(ATTACHED_SUPPLIER_DOCUMENT, '[^,]+', 1, LEVEL) ATTACHED_SUPPLIER_DOCUMENT,
REGEXP_SUBSTR(ATTACHED_LOGITECH_REPORT, '[^,]+', 1, LEVEL) ATTACHED_LOGITECH_REPORT
FROM CTE
CONNECT BY REGEXP_SUBSTR(NVL(NVL(TEST_PLAN, ATTACHED_SUPPLIER_DOCUMENT), ATTACHED_LOGITECH_REPORT), '[^,]+', 1, LEVEL) IS NOT NULL
ORDER BY MANUFACTURER_NAME
It should be giving the following output which is required too
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000 |CHENGDA |268452 |268456 |268465 |
001059-0000 |CHENGDA |268453 |268457 | |
001059-0000 |CHENGDA | |268459 | |
001059-0000 |CHENGDA | |268460 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451 |268456 |268465 |
001059-0000 |SYSTRON |268452 |268457 |268466 |
001059-0000 |SYSTRON |268453 |268459 | |
001059-0000 |SYSTRON | |268460 | |
But instead it is giving this out put and missing 2 rows or values for 1st CHENGDA
and last 1 row of SYSTRON
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000 |CHENGDA |268452 |268456 |268465 |
001059-0000 |CHENGDA |268453 |268457 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268462 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SYSTRON |268451 |268456 |268465 |
001059-0000 |SYSTRON |268452 |268457 |268466 |
001059-0000 |SYSTRON |268453 |268459 | |
The reason which I found out is when I swap the position of Test Plan
and Attached Supplier Document
in Connect By clause which probably is because of number of values in both columns. But this should no be happening because what if Test Plan had more values in any case ?
Hope someone could help.
(Before marking this question duplicate please do ask if it already has an answer or not because that query did not work)
Upvotes: 1
Views: 256
Reputation: 159
WITH CTE AS(
SELECT DISTINCT
MP.PART_NUMBER MANUFACTURER_PART_NUMBER, M.NAME MANUFACTURER_NAME,
RTRIM(LTRIM(PG3.MULTILIST31, ','), ',') TEST_PLAN,
RTRIM(LTRIM(PG3.MULTILIST32, ','), ',') ATTACHED_SUPPLIER_DOCUMENT,
RTRIM(LTRIM(PG3.MULTILIST33, ','), ',') ATTACHED_LOGITECH_REPORT
FROM MANU_PARTS MP
INNER JOIN MANUFACTURERS M ON M.ID = MP.MANU_ID
LEFT JOIN PAGE_TWO PG2 ON PG2.ID = MP.ID
LEFT JOIN PAGE_THREE PG3 ON PG3.ID = MP.ID
WHERE PART_NUMBER = '001059-0000'
)
SELECT MANUFACTURER_PART_NUMBER, MANUFACTURER_NAME,
REGEXP_SUBSTR(TEST_PLAN, '[^,]+', 1, column_value) TEST_PLAN,
REGEXP_SUBSTR(ATTACHED_SUPPLIER_DOCUMENT, '[^,]+', 1, column_value) ATTACHED_SUPPLIER_DOCUMENT,
REGEXP_SUBSTR(ATTACHED_LOGITECH_REPORT, '[^,]+', 1, column_value) ATTACHED_LOGITECH_REPORT
FROM CTE
CROSS JOIN TABLE(CAST(MULTISET(SELECT LEVEL FROM DUAL
CONNECT BY LEVEL <= GREATEST(REGEXP_COUNT(NVL(ATTACHED_LOGITECH_REPORT, 0), ','), REGEXP_COUNT(NVL(TEST_PLAN, 0), ','), REGEXP_COUNT(NVL(ATTACHED_SUPPLIER_DOCUMENT, 0), ','))+1
) AS sys.odcinumberlist))
I have managed to get the desired result by checking the maximum count of the column and allocating that to each item.
Here is the output.
MANUFACTURER_PART_NUMBER|MANUFACTURER_NAME |TEST_PLAN|ATTACHED_SUPPLIER_DOCUMENT|ATTACHED_LOGITECH_REPORT|
------------------------|-------------------------|---------|--------------------------|------------------------|
001059-0000 |CHENGDA |268452 |268456 |268465 |
001059-0000 |CHENGDA |268453 |268457 | |
001059-0000 |CHENGDA | |268459 | |
001059-0000 |CHENGDA | |268460 | |
001059-0000 |SYSTRON |268451 |268456 |268465 |
001059-0000 |SYSTRON |268452 |268457 |268466 |
001059-0000 |SYSTRON |268453 |268459 | |
001059-0000 |SYSTRON | |268460 | |
001059-0000 |SUZHOU SHARETECH | | | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268458 | |
001059-0000 |SHANGHAI MARRISON CO.,LTD| |268462 | |
Upvotes: 0
Reputation: 191275
As pointed out in comments, your data model isn't ideal, and you shouldn't be storing lists of values in strings. Assuming you're stuck with that, your problem is that the lists have different numbers of elements, so the depth can be different for each, leading to different limits on level
; and trying to guess which to base it on is going to be a problem, as you've already realised.
You could possibly try to figure out which value to use for each ID using regexp_count
and greatest
but that's going to be tricky with multiple rows to deal with as well. (The distinct
is an indication that you already have an issue there...)
Another approach is to split up each list individually and then try to associate matching elements - though that seems fairly arbitrary in itself. This seems to get the result you want though:
select mp.part_number,
m.name,
t.value31,
t.value32,
t.value33
from manu_parts mp
join manufacturers m
on m.id = mp.manu_id
left join (
select coalesce(t31.id, t32.id, t33.id) as id,
coalesce(t31.lvl, t32.lvl, t33.lvl) as lvl,
t31.value as value31,
t32.value as value32,
t33.value as value33
from
(
select id, level as lvl,
regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist31, '(.*?)(,|$)')
) t31
full outer join (
select id, level as lvl,
regexp_substr(multilist32, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist32, '(.*?)(,|$)')
) t32
on t32.id = t31.id and t32.lvl = t31.lvl
full outer join (
select id, level as lvl,
regexp_substr(multilist33, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist33, '(.*?)(,|$)')
) t33
on (t33.id = t32.id and t33.lvl = t32.lvl)
or (t33.id = t31.id and t33.lvl = t31.lvl)
) t on t.id = mp.id
where part_number = '001059-0000'
order by part_number, name, lvl;
PART_NUMBER NAME VALUE31 VALUE32 VALUE33
----------- ------------------------- ---------- ---------- ----------
001059-0000 CHENGDA 268452 268456 268465
001059-0000 CHENGDA 268453 268457
001059-0000 CHENGDA 268459
001059-0000 CHENGDA 268460
001059-0000 SHANGHAI MARRISON CO.,LTD 268458
001059-0000 SHANGHAI MARRISON CO.,LTD 268462
001059-0000 SUZHOU SHARETECH
001059-0000 SYSTRON 268451 268456 268465
001059-0000 SYSTRON 268452 268457 268466
001059-0000 SYSTRON 268453 268459
001059-0000 SYSTRON 268460
11 rows selected.
Each subquery like:
(
select id, level as lvl,
regexp_substr(multilist31, '(.*?)(,|$)', 1, level, null, 1) as value
from page_three
connect by id = prior id
and prior dbms_random.value is not null
and level < regexp_count(multilist31, '(.*?)(,|$)')
) t31
splits a single column from that table, preserving the ID to join against later, and the level to associate with the other columns for that ID. The three of those are then outer joined together; and the result of that is joined to the main table.
The performance might be horrible, but if you're filtering on a single part number then that might be pushed into the subqueries so it doesn't do more work than needed.
You could also look at recursive subquery factoring instead of hierarchical queries, or possibly correlated subqueries with collections, but they will all have their own issues. That's what happens when you have to work around a bad data model.
Upvotes: 1