Steven Ogilvie
Steven Ogilvie

Reputation: 11

SQL Insert not aggregating the same as Select statement

I have an SQL Insert statement that takes data from various other tables and out joins. The query is run daily and populates from these staging tables.

I have an issue in that where field WSL_TNA_CNT is not null, it creates an additional row for historic data and so after 2 years, I will have 700+ rows for this product/date/vendor combo, one row will have all the relevant fields populated, except WSL_TNA_CNT. One row will have all 0's for the other fields, yet have a value for WSL_TNA_CNT. The rest of the rows will all just be 0's for all fields, and null for WSL_TNA_CNT.

I've tried numerous was to prevent this happening with no results, and so I have tried to use a CTE on the insert, which also failed. I have also then tried creating a further staging table, and reaggregating it on insert to my final table and this doesn't work. If I take the select statement (from the insert to my final table from the new staging table) - it aggregates correctly, however when it's run as an insert, I get numerous rows mimicking the above.

Can anyone shed some light on why this might be happening, and how I could go about fixing it? Ultimately the data when I use it is accurate, but the table is being populated with a lot of 'useless' rows which will just inflate over time.

This is my staging table insert (the original final table)

```
insert into /*+ APPEND */ qde500_staging
select
  drv.actual_dt,
  cat.department_no,
  sub.prod_category_no,
  drv.product_code,
  drv.vendor_no,
  decode(grn.qty_ordered,null,0,grn.qty_ordered),
  decode(grn.qty_delivered,null,0,grn.qty_delivered),
  decode(grn.qty_ordered_sl,null,0,grn.qty_ordered_sl),
  decode(grn.wsl_qty_ordered,null,0,grn.wsl_qty_ordered),
  decode(grn.wsl_qty_delivered,null,0,grn.wsl_qty_delivered),
  decode(grn.wsl_qty_ordered_sl,null,0,grn.wsl_qty_ordered_sl),
  decode(grn.brp_qty_ordered,null,0,grn.brp_qty_ordered),
  decode(grn.brp_qty_delivered,null,0,grn.brp_qty_delivered),
  decode(grn.brp_qty_ordered_sl,null,0,grn.brp_qty_ordered_sl),
  decode(sal.wsl_sales_value,null,0,sal.wsl_sales_value),
  decode(sal.wsl_cases_sold,null,0,sal.wsl_cases_sold),
  decode(sal.brp_sales_value,null,0,sal.brp_sales_value),
  decode(sal.brp_cases_sold,null,0,sal.brp_cases_sold),
  decode(sal.csl_ordered,null,0,sal.csl_ordered),
  decode(sal.csl_delivered,null,0,sal.csl_delivered),
  decode(sal.csl_ordered_sl,null,0,sal.csl_ordered_sl),
  decode(sal.csl_delivered_sl,null,0,sal.csl_delivered_sl),
  decode(sal.catering_ordered,null,0,sal.catering_ordered),
  decode(sal.catering_delivered,null,0,sal.catering_delivered),
  decode(sal.catering_ordered_sl,null,0,sal.catering_ordered_sl),
  decode(sal.catering_delivered_sl,null,0,sal.catering_delivered_sl),
  decode(sal.retail_ordered,null,0,sal.retail_ordered),
  decode(sal.retail_delivered,null,0,sal.retail_delivered),
  decode(sal.retail_ordered_sl,null,0,sal.retail_ordered_sl),
  decode(sal.retail_delivered_sl,null,0,sal.retail_delivered_sl),
  decode(sal.sme_ordered,null,0,sal.sme_ordered),
  decode(sal.sme_delivered,null,0,sal.sme_delivered),
  decode(sal.sme_ordered_sl,null,0,sal.sme_ordered_sl),
  decode(sal.sme_delivered_sl,null,0,sal.sme_delivered_sl),
  decode(sal.dcsl_ordered,null,0,sal.dcsl_ordered),
  decode(sal.dcsl_delivered,null,0,sal.dcsl_delivered),
  decode(sal.nat_ordered,null,0,sal.nat_ordered),
  decode(sal.nat_delivered,null,0,sal.nat_delivered),
  decode(stk.wsl_stock_cases,null,0,stk.wsl_stock_cases),
  decode(stk.wsl_stock_value,null,0,stk.wsl_stock_value),
  decode(stk.brp_stock_cases,null,0,stk.brp_stock_cases),
  decode(stk.brp_stock_value,null,0,stk.brp_stock_value),
  decode(stk.wsl_ibt_stock_cases,null,0,stk.wsl_ibt_stock_cases),
  decode(stk.wsl_ibt_stock_value,null,0,stk.wsl_ibt_stock_value),
  decode(stk.wsl_intran_stock_cases,null,0,stk.wsl_intran_stock_cases),
  decode(stk.wsl_intran_stock_value,null,0,stk.wsl_intran_stock_value),
  decode(pcd.status_9_pcodes,null,0,pcd.status_9_pcodes),
  decode(pcd.pcodes_in_stock,null,0,pcd.pcodes_in_stock),
  decode(gtk.status_9_pcodes,null,0,gtk.status_9_pcodes),
  decode(gtk.pcodes_in_stock,null,0,gtk.pcodes_in_stock),
  NULL,
  tna.tna_reason_code,
  decode(tna.wsl_tna_count,null,0,tna.wsl_tna_count),
  NULL,
  decode(cap.cap_order_qty,null,0,cap.cap_order_qty),
  decode(cap.cap_alloc_cap_ded,null,0,cap.cap_alloc_cap_ded),
  decode(cap.cap_sell_block_ded,null,0,cap.cap_sell_block_ded),
  decode(cap.cap_sit_ded,null,0,cap.cap_sit_ded),
  decode(cap.cap_cap_ded_qty,null,0,cap.cap_cap_ded_qty),
  decode(cap.cap_fin_order_qty,null,0,cap.cap_fin_order_qty),
  decode(cap.cap_smth_ded_qty,null,0,cap.cap_smth_ded_qty),
  decode(cap.brp_sop2_tna_qty,null,0,cap.brp_sop2_tna_qty)
from
  qde500_driver   drv,
  qde500_sales2   sal,
  qde500_stock    stk,
  qde500_grn_data grn,
  qde500_pcodes_out_of_stock_agg pcd,
  qde500_gtickets_out_of_stock2 gtk,
  qde500_wsl_tna tna,
  qde500_capping cap,
  warehouse.dw_product  prd,
  warehouse.dw_product_sub_category sub,
  warehouse.dw_product_merchandising_cat mch,
  warehouse.dw_product_category cat
where
    drv.product_code = prd.product_code
and prd.prod_merch_category_no = mch.prod_merch_category_no
and mch.prod_sub_category_no = sub.prod_sub_category_no
and sub.prod_category_no = cat.prod_category_no
and drv.product_code = grn.product_code(+)
and drv.product_code = sal.product_code(+)
and drv.actual_dt = grn.actual_dt(+)
and drv.actual_dt = sal.actual_dt(+)
and drv.vendor_no = sal.vendor_no(+)
and drv.vendor_no = grn.vendor_no(+)
and drv.product_code = stk.product_code(+)
and drv.actual_dt = stk.actual_dt(+)
and drv.vendor_no = stk.vendor_no(+)
and drv.product_code = pcd.product_code(+)
and drv.actual_dt = pcd.actual_dt(+)
and drv.vendor_no = pcd.vendor_no(+)
and drv.product_code = gtk.product_code(+)
and drv.actual_dt = gtk.actual_dt(+)
and drv.vendor_no = gtk.vendor_no(+)
and drv.product_code = tna.product_code(+)
and drv.actual_dt = tna.actual_dt(+)
and drv.vendor_no = tna.vendor_no(+)
and drv.product_code = cap.product_code(+)
and drv.actual_dt = cap.actual_dt(+)
and drv.vendor_no = cap.vendor_no(+)
;
```

Then in a bid to re-aggregate it, I have done the below, which works as the 'Select' but not as an Insert.

```
select
actual_dt,
department_no, 
prod_category_no, 
product_code,
vendor_no,
sum(qty_ordered),
sum(qty_delivered),
sum(qty_ordered_sl),
sum(wsl_qty_ordered),
sum(wsl_qty_delivered),
sum(wsl_qty_ordered_sl),
sum(brp_qty_ordered),
sum(brp_qty_delivered),
sum(brp_qty_ordered_sl),
sum(wsl_sales_value),
sum(wsl_cases_sold),
sum(brp_sales_value),
sum(brp_cases_sold),
sum(csl_ordered),
sum(csl_delivered),
sum(csl_ordered_sl),
sum(csl_delivered_sl),
sum(catering_ordered),
sum(catering_delivered),
sum(catering_ordered_sl),
sum(catering_delivered_sl),
sum(retail_ordered),
sum(retail_delivered),
sum(retail_ordered_sl),
sum(retail_delivered_sl),
sum(sme_ordered),
sum(sme_delivered),
sum(sme_ordered_sl),
sum(sme_delivered_sl),
sum(dcsl_ordered),
sum(dcsl_delivered),
sum(nat_ordered),
sum(nat_delivered),
sum(wsl_stock_cases),
sum(wsl_stock_value),
sum(brp_stock_cases),
sum(brp_stock_value),
sum(wsl_ibt_stock_cases),
sum(wsl_ibt_stock_value),
sum(wsl_intran_stock_cases),
sum(wsl_intran_stock_value),
sum(status_9_pcodes),
sum(pcode_in_stock),
sum(gt_status_9),
sum(gt_in_stock),
gt_product,
tna_reason_code,
sum(tna_wsl_pcode_cnt),
sum(tna_brp_pcode_cnt),
sum(cap_order_qty),
sum(cap_alloc_cap_ded),
sum(cap_sell_block_ded),
sum(cap_sit_ded),
sum(cap_cap_ded_qty),
sum(cap_fin_order_qty),
sum(cap_smth_ded_qty),
sum(brp_sop2_tna_qty)
from 
qde500_staging
group by
actual_dt,
department_no, 
prod_category_no, 
product_code,
vendor_no,
tna_reason_code,
gt_product
```

**Edit, I've been asked to provide a minimal reproducible example - It's a 1,700 line script across 27 tables, but I'll try.

The "TNA" data is only held for one day in the data warehouse, and so it is kept in my temp table qde500_wsl_tna as a history over time. It runs through a multi stage process in which all the prior tables are dropped daily after being populated, and so on a day by day basis only yesterdays data is available. qde500_wsl_tna is not dropped/truncated in order to retain the history.

create table qde500_wsl_tna (
actual_dt           DATE,  
product_code        VARCHAR2(7),
vendor_no           NUMBER(5),
tna_reason_code     VARCHAR2(2),
wsl_tna_count       NUMBER(4)
)
storage ( initial 10M next 1M )
;

The insert for this being

insert into /*+ APPEND */ qde500_wsl_tna
select
  tna1.actual_dt,
  tna1.product_code,
  tna1.vendor_no,
  tna1.reason_code,
  sum(tna2.wsl_tna_count)
from
  qde500_wsl_tna_pcode_prob_rsn tna1,
  qde500_wsl_tna_pcode_count tna2
where
  tna1.actual_dt = tna2.actual_dt
and tna1.product_code = tna2.product_code
and tna1.product_Code not in ('P092198','P118189', 'P117935', 'P117939', 'P092182', 'P114305', 'P114307', 'P117837', 'P117932', 'P119052', 'P092179', 'P092196', 'P126340', 'P126719', 'P126339', 'P126341', 'P195238', 'P125273', 'P128205', 'P128208', 'P128209', 'P128210', 'P128220', 'P128250', 'P141152', 'P039367', 'P130616', 'P141130', 'P143820', 'P152404', 'P990788', 'P111951', 'P040860', 'P211540', 'P141152')
group by
  tna1.actual_dt,
  tna1.product_code,
  tna1.vendor_no,
  tna1.reason_code
;

The source tables for this are just aggregation of branches containing the TNA and a ranking of the reason for the TNA, as we only want the largest of the reason codes to give a single row per date/product/vendor combo.

select * from qde500_wsl_tna
where actual_dt = '26-aug-2024';

qde500_wsl_tna

ACTUAL_DT PRODUCT_CODE VENDOR_NO TNA_REASON_CODE WSL_TNA_COUNT
26/08/2024 00:00 P470039 20608 I 27
26/08/2024 00:00 P191851 14287 I 1
26/08/2024 00:00 P045407 19981 I 1
26/08/2024 00:00 P760199 9975 I 3
26/08/2024 00:00 P179173 18513 T 3
26/08/2024 00:00 P113483 59705 I 16
26/08/2024 00:00 P166675 58007 I 60
26/08/2024 00:00 P166151 4268 I 77
26/08/2024 00:00 P038527 16421 I 20

This has no duplicates before it feeds into qde500_staging.

However, when I run my insert, I get the following:

insert into qde500_Staging - I cannot create an example table as SO thinks it's code and should be indented, when it's not, so the image will need to do

Then, if I run just the select in my IDE I get

ACTUAL_DT DEPARTMENT_NO PROD_CATEGORY_NO PRODUCT_CODE VENDOR_NO QTY_ORDERED QTY_DELIVERED QTY_ORDERED_SL GT_PRODUCT TNA_REASON_CODE TNA_WSL_PCODE_CNT
26/08/2024 00:00 8 885 P179173 18513 1649 804 2624 T 3

The create table for my staging is as follows (truncated to reduce complexity):

create table qde500_staging (
actual_dt          DATE,
department_no      NUMBER(2), 
prod_category_no   NUMBER(4), 
product_code       VARCHAR2(7),
vendor_no          NUMBER(7),
qty_ordered        NUMBER(7,2),
qty_delivered      NUMBER(7,2),
qty_ordered_sl     NUMBER(7,2),
gt_product         VARCHAR2(1),
tna_reason_code    VARCHAR2(2),
tna_wsl_pcode_cnt NUMBER(4)
)
;

Upvotes: 1

Views: 102

Answers (2)

Lajos Arpad
Lajos Arpad

Reputation: 76943

So, the problem is stated as

I have an issue in that where field WSL_TNA_CNT is not null, it creates an additional row for historic data and so after 2 years, I will have 700+ rows for this product/date/vendor combo, one row will have all the relevant fields populated, except WSL_TNA_CNT. One row will have all 0's for the other fields, yet have a value for WSL_TNA_CNT. The rest of the rows will all just be 0's for all fields, and null for WSL_TNA_CNT.

So, you have:

  • WSL_TNA_CNT is not null
  • creates an additional row of historic data
  • one row populates everything, except WSL_TNA_CNT
  • the rest of the rows will have 0 for all fields and null for WSL_TNA_CNT

The solution: you will need to add criteria to your where that will be false for all the records you do not want and true for all the records that you do want. If having 0 for some fields and null for WSL_TNA_CNT is invalid, then you will need something like

(
    WSL_TNA_CNT IS NOT NULL AND
    field1 > 0 AND
    field2 > 0 AND
    ...
    ...
)

In order to make the further specification like in the above, you will need to carefully study

  1. The records that you intend to insert
  2. The records that you did not intend to insert

Make sure that you carefully study the differences between the two until you are confident that you can write a criteria that will always be true for the records you wand and never be true for the records you did not want. Once you are ready, write the criteria like in the above and append it to your WHERE. Make sure you write a minimalistic code, so if, for example 10 logical expressions with AND are enough, don't add an 11th one.

Upvotes: 0

d r
d r

Reputation: 7846

Regarding the basic question:
If I got it right, you say that certain Select statement that results with one row in your IDE when run as Insert Into staging table inserts more then that one row. It seems that something has to be different: either the data in source table or the select doing insert is different or already existing data in target table is not as expected or something else - something is just not as it seems to be.

Regarding the sample data provided:

qde500_wsl_tna is not dropped/truncated in order to retain the history.

Could it be possible that your history data containes some "junk" rows that you wouldn't expect. Lets say that your qde500_wsl_tna sample data provided has two additional rows where the date part of the actual_dt column is the same but the time part differs.

--    S a m p l e    D a t a :      (with two more rows added)
Insert Into qde500_wsl_tna 
  ( Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P470039', 20608, 'I', 27 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P191851', 14287, 'I',  1 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P045407', 19981, 'I',  1 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P760199',  9975, 'I',  3 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P179173', 18513, 'T',  3 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P113483', 59705, 'I', 16 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P166675', 58007, 'I', 60 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P166151',  4268, 'I', 77 From Dual Union All
    Select To_Date('26/08/2024 00:00', 'dd/mm/yyyy hh24:mi'), 'P038527', 16421, 'I', 20 From Dual
   UNION ALL   -- possible rows with the same date part of actual_dt column but the time part is different  
    Select To_Date('26/08/2024 00:00:15', 'dd/mm/yyyy hh24:mi:ss'), 'P179173', 16421, 'T', null From Dual Union All
    Select To_Date('26/08/2024 00:00:27', 'dd/mm/yyyy hh24:mi:ss'), 'P179173', 16421, 'T', null From Dual
  );

This has no duplicates before it feeds into qde500_staging.

You offered a select for the 26-aug-2024 like this ...

select * from qde500_wsl_tna
where actual_dt = '26-aug-2024';

... which results exactly as in the table just below the code and that doesn't fetch the last two rows added to your sample data. You could fetch them if your Select was like here ...

select * from qde500_wsl_tna
where TRUNC(actual_dt) = To_Date('26-aug-2024', 'dd-mon-yyyy');

... next, if the data looks like that then when you run the insert command ...

The insert for this being

insert into /*+ APPEND */ qde500_wsl_tna
select
  tna1.actual_dt,
  tna1.product_code,
  tna1.vendor_no,
  tna1.reason_code,
  sum(tna2.wsl_tna_count)
from
  qde500_wsl_tna_pcode_prob_rsn tna1,
  qde500_wsl_tna_pcode_count tna2
where
  tna1.actual_dt = tna2.actual_dt
and tna1.product_code = tna2.product_code
and tna1.product_Code not in ('P092198','P118189', 'P117935', 'P117939', 'P092182', 'P114305', 'P114307', 'P117837', 'P117932', 'P119052', 'P092179', 'P092196', 'P126340', 'P126719', 'P126339', 'P126341', 'P195238', 'P125273', 'P128205', 'P128208', 'P128209', 'P128210', 'P128220', 'P128250', 'P141152', 'P039367', 'P130616', 'P141130', 'P143820', 'P152404', 'P990788', 'P111951', 'P040860', 'P211540', 'P141152')
group by
  tna1.actual_dt,
  tna1.product_code,
  tna1.vendor_no,
  tna1.reason_code

... it will result with "additional rows" because of the grouping by actual_dt along with other columns ...

See the fiddle here.

NOTE:
I'm not claiming that this is the case, but something like this could result as you described it in your question.
Check your data thoroughly to be sure that they are OK.

Upvotes: 0

Related Questions