Reputation: 4239
I have two columns in an oracle sql table where a lot of important transaction information is stored in a not so easily retrievable fashion. The dataset has two columns trxn_a and trxn_b.
trxn_a :
2019-01-25~cash deposit~$5,000~John Doe#2019-01-26~cash deposit~$1,000~John Doe#
trxn_b :
2019-01-25~cash deposit~$3,000~John Doe#2019-01-25~cash deposit~$1,500~John Doe#2019-01-26~cash deposit~$100~John Doe#2019-01-26~cash deposit~$800~John Doe#2019-01-26~cash deposit~$100~John Doe#
As you would see that the fields are delimited by ~
and records are delimited by #
. There can be any number of transactions (so, any number of #
in a cell).
The data listed above is just one record of data in two columns (so, two cells).
My goal is to turn data this into multiple rows, each row will be an aggregation of sum(trxn_amount)
by date. Please see desired output below:
date, trxn_amt_a, trxn_amt_b
2019-01-25, 5000, 4500
2019-01-26, 1000, 1000
I tried INSTR
and SUBSTR
functions but that's not powerful to handle the variabilities in this structure of data. Plus, I am not sure how to :
Upvotes: 1
Views: 103
Reputation: 222622
This is a complicated one. Here is a step by step description of how I proceeded.
The first part consists in splitting each value into rows, using the #
separator. For this we use REGEXP_SUBSTR()
along with CONNECT BY
to generate recursion.
select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
from mytable
connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
Then, we need to parse each resulting value into columns. This can be done simply with a series of REGEXP_SUBSTR()
. Special care needs to be taken with the column that contains amount values, that contain non-numeric characters ('$5,000'
) : invalid characters need to be removed, so the value can later on be treated as a number.
NB : for your purpose, you don't actually need to recover all 4 columns from the value (the date and amount are sufficient) ; I am showing all columns in case you ever need to access another one.
select
'ta' src,
regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
from (
select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
from mytable
connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
)
Each column in the source table (trxn_a
, trxn_b
) must be processed separatly, since each value generates a random number of records. Results can be UNION
ed, and then an outer query does conditional aggregation :
Finaly query :
with t as (
select
'ta' src,
regexp_substr(trxn_a,'[^~]+', 1, 1) tdate,
regexp_substr(trxn_a,'[^~]+', 1, 2) ttype,
replace(regexp_substr(trxn_a,'[^$~]+', 1, 3), ',', '') tamount,
regexp_substr(trxn_a,'[^~]+', 1, 4) tuser
from (
select trim(regexp_substr(trxn_a,'[^#]+', 1, level) ) trxn_a, level
from mytable
connect by regexp_substr(trxn_a, '[^#]+', 1, level) is not null
)
union all
select
'tb' src,
regexp_substr(trxn_b,'[^~]+', 1, 1) tdate,
regexp_substr(trxn_b,'[^~]+', 1, 2) ttype,
replace(regexp_substr(trxn_b,'[^$~]+', 1, 3), ',', '') tamount,
regexp_substr(trxn_b,'[^~]+', 1, 4) tuser
from (
select trim(regexp_substr(trxn_b,'[^#]+', 1, level) ) trxn_b, level
from mytable
connect by regexp_substr(trxn_b, '[^#]+', 1, level) is not null
)
)
select
tdate,
SUM(DECODE(src, 'ta', tamount, 0)) trxn_amt_a,
SUM(DECODE(src, 'tb', tamount, 0)) trxn_amt_b
from t
group by tdate;
With your test data, this demo on DB Fiddle yields :
TDATE TRXN_AMT_A TRXN_AMT_B
2019-01-25 5000 4500
2019-01-26 1000 1000
Upvotes: 2
Reputation: 3970
Use REGEXP_SUBSTR
to split the records on #
.Furthermore, I dont see how you got another column added perhaps magic in the op as you dont have in the input. Simply Replace(the_below_string,'~cash deposit~',',')
SELECT DISTINCT REGEXP_SUBSTR
('2019-01-25~cash deposit~$5,000~John
Doe#2019-.
01-26~cash deposit~$1,000~John Doe#',
'[^,#]+',1,LEVEL) as "Data"
FROM Table
CONNECT BY REGEXP_SUBSTR
('2019-01-25~cash deposit~$5,000~John Doe#
2019-01-26~cash deposit~$1,000~John Doe#
','[^,#]+',1,LEVEL) IS NOT NULL
Upvotes: 1