user3422637
user3422637

Reputation: 4239

Oracle SQL: Parsing transaction amounts from field delimited record delimited string column

I have two columns in an oracle sql table where a lot of important transaction information is stored in a not so easily retrievable fashion. The dataset has two columns trxn_a and trxn_b.

trxn_a :

2019-01-25~cash deposit~$5,000~John Doe#2019-01-26~cash deposit~$1,000~John Doe#

trxn_b :

2019-01-25~cash deposit~$3,000~John Doe#2019-01-25~cash deposit~$1,500~John Doe#2019-01-26~cash deposit~$100~John Doe#2019-01-26~cash deposit~$800~John Doe#2019-01-26~cash deposit~$100~John Doe#

As you would see that the fields are delimited by ~ and records are delimited by #. There can be any number of transactions (so, any number of # in a cell).

The data listed above is just one record of data in two columns (so, two cells).

My goal is to turn data this into multiple rows, each row will be an aggregation of sum(trxn_amount) by date. Please see desired output below:

 date, trxn_amt_a, trxn_amt_b
 2019-01-25, 5000, 4500
 2019-01-26, 1000, 1000

I tried INSTR and SUBSTR functions but that's not powerful to handle the variabilities in this structure of data. Plus, I am not sure how to :

  1. parse the dates, the transaction amounts
  2. aggregate amounts by date and
  3. then explode the cells into different rows

Upvotes: 1

Views: 103

Answers (2)

GMB
GMB

Reputation: 222622

This is a complicated one. Here is a step by step description of how I proceeded.


The first part consists in splitting each value into rows, using the # separator. For this we use REGEXP_SUBSTR() along with CONNECT BY to generate recursion.

select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
from mytable
connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null

Then, we need to parse each resulting value into columns. This can be done simply with a series of REGEXP_SUBSTR(). Special care needs to be taken with the column that contains amount values, that contain non-numeric characters ('$5,000') : invalid characters need to be removed, so the value can later on be treated as a number.

NB : for your purpose, you don't actually need to recover all 4 columns from the value (the date and amount are sufficient) ; I am showing all columns in case you ever need to access another one.

select
    'ta' src,
    regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
    regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
    replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
    regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
from (
    select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
    from mytable
    connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
)

Each column in the source table (trxn_a, trxn_b) must be processed separatly, since each value generates a random number of records. Results can be UNIONed, and then an outer query does conditional aggregation :

Finaly query :

with t as (
    select
        'ta' src,
        regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
        regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
        replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
        regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
    from (
        select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
        from mytable
        connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
    )
    union all
    select
        'tb' src,
        regexp_substr(trxn_b,'[^~]+', 1, 1) tdate,
        regexp_substr(trxn_b,'[^~]+', 1, 2) ttype,
        replace(regexp_substr(trxn_b,'[^$~]+', 1, 3), ',', '') tamount,
        regexp_substr(trxn_b,'[^~]+', 1, 4) tuser
    from (
        select trim(regexp_substr(trxn_b,'[^#]+', 1, level) ) trxn_b, level
        from mytable
        connect by regexp_substr(trxn_b, '[^#]+', 1, level) is not null
    )
)
select
    tdate, 
    SUM(DECODE(src, 'ta', tamount, 0)) trxn_amt_a,
    SUM(DECODE(src, 'tb', tamount, 0)) trxn_amt_b
from t
group by tdate;

With your test data, this demo on DB Fiddle yields :

TDATE       TRXN_AMT_A  TRXN_AMT_B
2019-01-25  5000        4500
2019-01-26  1000        1000

Upvotes: 2

Himanshu
Himanshu

Reputation: 3970

Use REGEXP_SUBSTR to split the records on #.Furthermore, I dont see how you got another column added perhaps magic in the op as you dont have in the input. Simply Replace(the_below_string,'~cash deposit~',',')

      SELECT DISTINCT REGEXP_SUBSTR 
       ('2019-01-25~cash deposit~$5,000~John 
        Doe#2019-. 
      01-26~cash deposit~$1,000~John Doe#',
       '[^,#]+',1,LEVEL) as "Data" 
        FROM   Table
         CONNECT BY REGEXP_SUBSTR 
       ('2019-01-25~cash deposit~$5,000~John Doe#
         2019-01-26~cash deposit~$1,000~John Doe#
          ','[^,#]+',1,LEVEL) IS NOT NULL

Upvotes: 1

Related Questions