Reputation: 377

regexp_substr Pattern for line splitting where separator is line break. Oracle 19c

How to correctly compose a mask for regexp_substr in order to get data of the form where each value is in a new line at the output? And how can I remove empty values from results, str field? This is the case when the last value is the beginning of the line. I do this, but this query does not return results, in regexp_substr I use the mask '[' || chr (10) || ']'.

select ft.field_id , regexp_substr(ft.validation_data,'['||chr(10)||']', 1, rownum) str 
            from mytable ft
            where ft.validation_data is not null
connect by rownum <= regexp_count(ft.validation_data,'['||chr(10)||']');

In the validation_data field of the mytable, the data each line looks like this:

-
ПДКП КА
ДКП
ДКП(Знач1/Знач2)
ПДУПА
ДУПА
<-There is a space here

Знач1/Знач2
Знач3/Знач4
Знач5/Знач6

Each value is separated by a line break. At the beginning of each line there can be a character - or one or more spaces. I need to get this view:

field_id	str
1	-
2	ПДКП
3	ДКП
4	ДКП(Знач1/Знач2)
5	ДУПА
6	ДКП(Знач1/Знач2)
7	Знач1/Знач2

etc

Upvotes: 0

Answers (3)

Barbaros Özhan

Reputation: 65363

Depending on the title and the CHR(10) character used, the expected result set seems wrong. Since you just want to split the strings by line feed character, no need duplicates, no need to extract the substring inside parentheses...

I presume that there is an id column coming from the table, and field_id column which is intended to be generated sequentially through use of rownum. But, the query will fail to yield decent results if rownum pseudocolumn is used to replace the value of field_id whenever the table has more than one rows. Then replace it with level keyword.

Considering all those facts, you can convert your query to the following one

 SELECT id, level AS field_id,
        REGEXP_SUBSTR(validation_data, '[^'||CHR(10)||']+', 1, level) AS str
   FROM mytable 
CONNECT BY level <= REGEXP_COUNT(validation_data, CHR(10))+1
    AND PRIOR SYS_GUID() IS NOT NULL
    AND PRIOR id = id

or alternatively

 SELECT id, column_value AS field_id,
        REGEXP_SUBSTR(validation_data, '[^'||CHR(10)||']+', 1, column_value) AS str
   FROM mytable,
        TABLE(CAST(MULTISET(SELECT level
                              FROM dual 
                           CONNECT BY level <= 
                     REGEXP_COUNT(validation_data,CHR(10))+1) AS sys.odcivarchar2list ))

Demo

Edit : if you want to get rid of the generated rows with whitespaces, then add

AND REGEXP_SUBSTR(RTRIM(validation_data), '[^'||CHR(10)||']+', 1,level) IS NOT NULL

to the first,

and

WHERE REGEXP_SUBSTR(RTRIM(validation_data), '[^'||CHR(10)||']+', 1,column_value) IS NOT NULL

to the second queries.

Upvotes: 1

oracloid_epta

Reputation: 11

Or this:

with mytable as (
select 1 field_id,
'1asdasdad5
 2asdasd
3ФЫВФЫВ' validation_data
from dual
)
select ft.field_id, regexp_substr(ft.validation_data, '[^'||chr(10)||']+',1,level) str
from mytable ft
where ft.validation_data is not null
connect by rownum <= regexp_count(ft.validation_data, chr(10))+1

Upvotes: 1

Dornaut

Reputation: 553

If your values do not contain whitespaces inside you can simply use negated whitespace class

select rownum, regexp_substr(tt, '\S+', 1, rownum)
from (
select '-
ПДКП
ДКП
ДКП(Знач1/Знач2)
ПДУПА
ДУПА' tt from dual
)
connect by level <= regexp_count(tt, '\S+');

https://dbfiddle.uk/?rdbms=oracle_11.2&fiddle=bd6c0270ce753da4033e79d4bd5b18c3

If can - use '[^'||chr(13)||chr(10)||']+'

Upvotes: 1

regexp_substr Pattern for line splitting where separator is line break. Oracle 19c

Answers (3)

Related Questions