Reputation: 832
I have a reviews
table as follows:
r_id | my_comment |
---|---|
1 | Boxes with the TID 823 cannot exceed 40 kg |
2 | Parcel with the marking tid 63157 must not make the weight go over 31 k.g |
3 | Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg |
4 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
5 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
I am trying to match 2 things. The TID and the weight (kg). As you can see there are a 3 things to keep in mind
kg
and k.g
and written in 2 ways <weight> <kg or k.g>
<weight><kg or k.g>
(one with space, one without space)TID<id>
or TID <id>
(one with a space, one without a space.So I am able to extract the TID
and the weight if the comment only has 1 weight and 1 TID. However, if it has multiple, I fail to do so. So I want to separate the multiple into different rows.
This is my desired output
r_id | tid | weight | my_comment |
---|---|---|---|
1 | 823 | 40 | Boxes with the TID 823 cannot exceed 40 kg |
2 | 63157 | 31 | Parcel with the marking tid 63157 must not make the weight go over 31 k.g |
3 | 104124 | 94.477 | Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg |
3 | 92341 | Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg | |
4 | 38204 | 45.4 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
4 | 8242602 | 92 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
5 | 94514 | 52 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
5 | 51 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
SQL to create table/dummy data:
CREATE TABLE reviews(
r_id number(3) NOT NULL,
my_comment VARCHAR(255) NOT NULL
);
INSERT INTO reviews (r_id, my_comment) VALUES (1, 'Boxes with the TID 823 cannot exceed 40 kg');
INSERT INTO reviews (r_id, my_comment) VALUES (2, 'Parcel with the marking tid 63157 must not make the weight go over 31 k.g');
INSERT INTO reviews (r_id, my_comment) VALUES (3, 'Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg');
INSERT INTO reviews (r_id, my_comment) VALUES (4, 'TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg');
INSERT INTO reviews (r_id, my_comment) VALUES (5, 'Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG');
In my attempt, I am able to extract the tid and weight, but only the first instance and not able to split it into rows.
SELECT
r_id,
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(tid).*?[0-9]+', 1, 1, 'i'),
'[0-9]+'
) as "tid",
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)', 1, 1, 'i'),
'[0-9]+'
) as "weight"
FROM reviews;
Upvotes: 2
Views: 275
Reputation: 142798
I am able to extract the tid and weight, but only the first instance and not able to split it into rows.
Your query, modified:
tid
and weight
regexp_substr
's occurrence
parameter (was 1
, now it is column_value
)cross join
was added which "loops" through my_comment
as many times as there's greatest occurrence between tid
and kg
(in any form)
tid
and 1 kg
, it'll "loop" 2 timesconnect by level
clauseYou did tag the question as Oracle 10; I don't have it any more, but I know it doesn't support regexp_count
function. If that's really the case (you never answered Koen's question), then it won't work and you'll have to calculate number of tid
/weight
occurrences using some other way. I hope you aren't on 10g, though.
I ran this code in SQL*Plus. BREAK
is here just to nicely distinguish r_id
and my_comment
values, doesn't have any other purpose.
SQL> break on r_id on my_comment
SQL> SELECT r_id,
2 my_comment,
3 REGEXP_SUBSTR (REGEXP_SUBSTR (my_comment,
4 '(tid).*?[0-9]+',
5 1,
6 COLUMN_VALUE,
7 'i'),
8 '[0-9]+') AS "tid",
9 REGEXP_SUBSTR (
10 REGEXP_SUBSTR (
11 my_comment,
12 '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)',
13 1,
14 COLUMN_VALUE,
15 'i'),
16 '[0-9]+') AS "weight"
17 FROM reviews
18 CROSS JOIN
19 TABLE (
20 CAST (
21 MULTISET (
22 SELECT LEVEL
23 FROM DUAL
24 CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT (my_comment, 'tid' , 1, 'i'),
25 REGEXP_COUNT (my_comment, '(kg|k.g)', 1, 'i')))
26 AS SYS.odcinumberlist));
which results in
R_ID MY_COMMENT tid weight
----- ------------------------------------------------------------------------- ------- -------
1 Boxes with the TID 823 cannot exceed 40 kg 823 40
2 Parcel with the marking tid 63157 must not make the weight go over 31 k.g 63157 31
3 Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg 104124 94
92341
4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg 38204 45
8242602 92
5 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG 94514 52
51
8 rows selected.
SQL>
Upvotes: 2