Burhan Khalid Butt
Burhan Khalid Butt

Reputation: 275

Get String Before a Substring till nth Occurrences of a Character

I am working on a string where I am trying to get a string before any occurrence of substring "1:" till 3 occurrences of character "<" and 3 occurrences of character ">".

Following is the string:

<Affected Items.Lifecycle Phase>WAS<Items.Active>IS<Items.Obsolete>1:<対象アイテム.ライフサイクル フェーズ> は <アイテム.Active> から <アイテム.破棄> に変更されました。2:<受影響的項目.生命週期階段>原為<物料.Active>現為<物料.報廢>3:<Données techniques affectées.Phase de cycle de vie>ÉTAIT<Données techniques.Active>EST<Données techniques.Obsolète>4:<受影响的物件.生命周期阶段>原为<物件.Active>现为<物件.报废>5:<Betroffene Artikel.Lebenszyklusphase>WAR<Artikel.Active>IST<Artikel.Veraltet>6:<영향 받은 항목.수명 주기 단계>기존:<항목.Active>현재:<항목.폐기>7:<Задействованные элементы.Фаза жизненного цикла>ЯВЛЯЛСЯ<Элементы.Active>ЯВЛЯЕТСЯ<Элементы.Устарело>; <Affected Items.New Rev>WAS<>IS<1>1:<対象アイテム.新規リビジョン> は <> から <1> に変更されました。2:<受影響的項目.新版本>原為<>現為<1>3:<Données techniques affectées.Nouvelle révision>ÉTAIT<>EST<1>4:<受影响的物件.新版本>原为<>现为<1>5:<Betroffene Artikel.Neue Revision>WAR<>IST<1>6:<영향 받은 항목.새 수정 버전>기존:<>현재:<1>7:<Задействованные элементы.Новая ред.>ЯВЛЯЛСЯ<>ЯВЛЯЕТСЯ<1>

And I want the following substrings:

<Affected Items.Lifecycle Phase>WAS<Items.Active>IS<Items.Obsolete>

And

<Affected Items.New Rev>WAS<>IS<1>

I am able to get the first substring using below SQL. But struggling to get the second one.

WITH CTE AS
(
SELECT '<Affected Items.Lifecycle Phase>WAS<Items.Active>IS<Items.Obsolete>1:<対象アイテム.ライフサイクル フェーズ> は <アイテム.Active> から <アイテム.破棄> に変更されました。2:<受影響的項目.生命週期階段>原為<物料.Active>現為<物料.報廢>3:<Données techniques affectées.Phase de cycle de vie>ÉTAIT<Données techniques.Active>EST<Données techniques.Obsolète>4:<受影响的物件.生命周期阶段>原为<物件.Active>现为<物件.报废>5:<Betroffene Artikel.Lebenszyklusphase>WAR<Artikel.Active>IST<Artikel.Veraltet>6:<영향 받은 항목.수명 주기 단계>기존:<항목.Active>현재:<항목.폐기>7:<Задействованные элементы.Фаза жизненного цикла>ЯВЛЯЛСЯ<Элементы.Active>ЯВЛЯЕТСЯ<Элементы.Устарело>; <Affected Items.New Rev>WAS<>IS<1>1:<対象アイテム.新規リビジョン> は <> から <1> に変更されました。2:<受影響的項目.新版本>原為<>現為<1>3:<Données techniques affectées.Nouvelle révision>ÉTAIT<>EST<1>4:<受影响的物件.新版本>原为<>现为<1>5:<Betroffene Artikel.Neue Revision>WAR<>IST<1>6:<영향 받은 항목.새 수정 버전>기존:<>현재:<1>7:<Задействованные элементы.Новая ред.>ЯВЛЯЛСЯ<>ЯВЛЯЕТСЯ<1>'
AS "DETAILS"
FROM DUAL
)
select "DETAILS",
     CASE
          WHEN "DETAILS" != CONVERT ("DETAILS", 'US7ASCII')
            THEN
              CASE WHEN "DETAILS" LIKE '%1:%'
                THEN
               SUBSTR("DETAILS", 0, INSTR("DETAILS", '1:')-1)
               ELSE
                 REGEXP_REPLACE ("DETAILS", '[^ -~]', '')
                 END
             else
               "DETAILS" end as details

FROM CTE 

Any assistance would be appreciated!

Upvotes: 0

Views: 135

Answers (2)

MT0
MT0

Reputation: 167962

You can use:

WITH split (details, match, end_pos) AS (
  SELECT details,
         REGEXP_SUBSTR(details, '(.*?)\d+:', 1, 1, NULL, 1),
         REGEXP_INSTR(details, '(.*?)\d+:', 1, 1, 1)
  FROM   table_name
UNION ALL
  SELECT details,
         REGEXP_SUBSTR(details, '(.*?)\d+:', end_pos, 1, NULL, 1),
         REGEXP_INSTR(details, '(.*?)\d+:', end_pos, 1, 1)
  FROM   split
  WHERE  end_pos > 0
),
brackets (match, opening, closing) AS (
  SELECT match,
         INSTR(match, '<', -1, 3),
         INSTR(match, '>', -1, 3)
  FROM   split
  WHERE  end_pos > 0
),
last_3_brackets (match) AS (
  SELECT SUBSTR(match, LEAST(opening, closing)) AS match
  FROM   brackets
  WHERE  opening > 0
  AND    closing > 0
)
SELECT *
FROM   last_3_brackets
WHERE  match = CONVERT (match, 'US7ASCII')

Which, for the sample data:

CREATE TABLE table_name (details) AS
SELECT '<Affected Items.Lifecycle Phase>WAS<Items.Active>IS<Items.Obsolete>1:<対象アイテム.ライフサイクル フェーズ> は <アイテム.Active> から <アイテム.破棄> に変更されました。2:<受影響的項目.生命週期階段>原為<物料.Active>現為<物料.報廢>3:<Données techniques affectées.Phase de cycle de vie>ÉTAIT<Données techniques.Active>EST<Données techniques.Obsolète>4:<受影响的物件.生命周期阶段>原为<物件.Active>现为<物件.报废>5:<Betroffene Artikel.Lebenszyklusphase>WAR<Artikel.Active>IST<Artikel.Veraltet>6:<영향 받은 항목.수명 주기 단계>기존:<항목.Active>현재:<항목.폐기>7:<Задействованные элементы.Фаза жизненного цикла>ЯВЛЯЛСЯ<Элементы.Active>ЯВЛЯЕТСЯ<Элементы.Устарело>; <Affected Items.New Rev>WAS<>IS<1>1:<対象アイテム.新規リビジョン> は <> から <1> に変更されました。2:<受影響的項目.新版本>原為<>現為<1>3:<Données techniques affectées.Nouvelle révision>ÉTAIT<>EST<1>4:<受影响的物件.新版本>原为<>现为<1>5:<Betroffene Artikel.Neue Revision>WAR<>IST<1>6:<영향 받은 항목.새 수정 버전>기존:<>현재:<1>7:<Задействованные элементы.Новая ред.>ЯВЛЯЛСЯ<>ЯВЛЯЕТСЯ<1>' FROM DUAL;

Outputs:

MATCH
<Affected Items.Lifecycle Phase>WAS<Items.Active>IS<Items.Obsolete>
<Betroffene Artikel.Lebenszyklusphase>WAR<Artikel.Active>IST<Artikel.Veraltet>
<Affected Items.New Rev>WAS<>IS<1>
<Betroffene Artikel.Neue Revision>WAR<>IST<1>

db<>fiddle here

Upvotes: 1

Lajos Arpad
Lajos Arpad

Reputation: 76434

This article: https://blogs.oracle.com/sql/post/split-comma-separated-values-into-rows-in-oracle-database

explains how you can split a comma-separated value strings into rows.

So, we could use <Affected Items. as our separator.

Doing so we could either insert those rows into a temporary table or wrap a query around them, using this idea as a substring and extract the valuable part of each row, that is, from the very start up until the first occurrence of :.

The link I have given provides this example:

with rws as (
  select 'split,into,rows' str from dual
)
  select regexp_substr (
           str,
           '[^,]+',
           1,
           level
         ) value
  from   rws
  connect by level <= 
    length ( str ) - length ( replace ( str, ',' ) ) + 1;
    
VALUE   
split    
into     
rows

and provides the following explanation:

enter image description here

So, if you view <Affected Items. to be your separator (instead of ,) and the rows are generated, each row will contain a useful information up until its first :. At the very end you will need to prepend the separator to your results.

Upvotes: 2

Related Questions