user3044431
user3044431

Reputation:

Case When not correcting?

The data from a table in our database is full of errors. I am trying to code a case, when, then clause that could potentially replace the errors with data that is correct, and from another table. Keep in mind that I have pretty basic privileges so I cant code anything more complex than case, when and then. The code I have so far is shown below:

select distinct t.*,
case
    --** t.Corridor_RB  is full of bad entries
    when t.Corridor_RB <> s.Corridor_RB 
        then s.Corridor_RB --** what I want the data to be
        else s.Corridor_RB
        end as CorrectRb
from PROJECT_CORRS_RB_MILES t left join
corridors_grouped_tis s on 
t.Corridor_RB = s.Corridor_RB
where t.Direction <> 5
order by t.Corridor_RB

Whats happening right now is the added column CorrectRb is still full of nulls as shown in the screen shot below:

enter image description here

So why is this happening? Why for example are the C000001N not correcting to C000001E from s.Corridor_RB? Shouldnt the empty spaces be data from s.Corridor_RB? The join itself is fine, its a one to many.

Here is data from corridors_grouped_tis:

TCR_CORRIDOR_ID CORRIDOR_RB SLICED
        C000001 C000001E    1
        C000002 C000002E    2
        C000003 C000003N    3
        C000004 C000004N    4
        C000005 C000005N    5
        C000006 C000006E    6
        C000007 C000007N    7
        C000008 C000008E    8
        C000009 C000009N    9
        C000010 C000010N    10
        C000011 C000011N    11
        C000012 C000012E    12

Output from projects_corrs_rb_miles is shown below:

         FED_ST_PRJ_NBR CONT_ID       ROUTE_NBR DIRECTION   CORRIDOR_RB BEG_TERMINI END_TERMINI
   1    HSIP-MT 1-6(76)384              11713           N-1 3   C000001E    384.0   386.6
   2    MT-NH 1-3(71)255                21216           N-1 3   C000001E    254.6   256.0
   3    HSIP-MT 1-6(76)384              11713           N-1 3   C000001E    384.0   386.6
   4    HSIP-MT 1-6(76)384              11713           N-1 3   C000001E    384.0   386.6
   5    MT-NH 1-3(71)255                21216           N-1 3   C000001E    254.6   256.0
   6    MT-NH 1-3(71)255                21216           N-1 3   C000001E    254.6   256.0
   7    HSIP 1-2(147)136                07C11           N-1 1   C000001N         
   8    NH 1-1(93)16                    08913           N-1 1   C000001N    16.0    44.8
   9    CBI-NHTSA-MT 1-1(79)45          03812           N-1 1   C000001N    44.9    48.6
  10    CBI-NHTSA-MT 1-1(79)45          03812           N-1 1   C000001N    44.9    48.6

Upvotes: 0

Views: 37

Answers (1)

Alex Poole
Alex Poole

Reputation: 191315

At the moment your case expression is a bit pointless since both branches evaluate to the same thing, s.Corridor_RB.

But that will be null if there is no matching row in s for a t row, and because you've included join condition

t.Corridor_RB = s.Corridor_RB

you will only get a match if they are the same, i.e. you will go in to the 'else' branch. For everything else there is no match on s, so s.Corridor_RB will be null. (Which will also go into the 'else', as you can't compare null to anything else with in/equality operators).

Essentially, it can only ever be null, or exactly the same as t.Corridor_RB.

You can probably just remove that condition, and it might be OK as an inner join; but then you will see that you have no other join conditions between s and t, so you will get a cartesian product (cross join) between the two tables, which is unlikely to be what you want.

You need to know how to identify a row in s which will have the correct value, based on some other column(s) in t. You can't base it on the column you know has bad entries - unless s only has one row, or has mappings between bad and good values, and you join on or selected the wrong column. As we can't see that table structure or data it's impossible to tell what you need that join to really be.


Based on the data from corridors_grouped_tis you added, it looks like you want to match based on the root, and just change the last character:

from PROJECT_CORRS_RB_MILES t left join
corridors_grouped_tis s on 
s.TCR_CORRIDOR_ID = substr(t.Corridor_RB, 1, length(t.Corridor_RB) - 1)

if the TCR_CORRIDOR_ID does what it appears to; or perhaps

substr(s.Corridor_RB, 1, length(s.Corridor_RB) - 1)
  = substr(t.Corridor_RB, 1, length(t.Corridor_RB) - 1)

or similar, though this is making assumptions about your patterns and how to match still.

Applying a function to a column value will generally prevent any index on that column being used, but if this is a small look-up table it probably won't matter and you're likely to hit a large proportion of rows anyway, making a full scan preferable. If there was a lot of data to work with you could add an indexed virtual column that does the substring, but it seems like overkill here.

Demo with your sample data as CTEs:

with corridors_grouped_tis(TCR_CORRIDOR_ID, CORRIDOR_RB, SLICED) as (
  select 'C000001', 'C000001E', 1 from dual
  union all select 'C000002', 'C000002E', 2 from dual
  union all select 'C000003', 'C000003N', 3 from dual
  union all select 'C000004', 'C000004N', 4 from dual
  union all select 'C000005', 'C000005N', 5 from dual
  union all select 'C000006', 'C000006E', 6 from dual
  union all select 'C000007', 'C000007N', 7 from dual
  union all select 'C000008', 'C000008E', 8 from dual
  union all select 'C000009', 'C000009N', 9 from dual
  union all select 'C000010', 'C000010N', 10 from dual
  union all select 'C000011', 'C000011N', 11 from dual
  union all select 'C000012', 'C000012E', 12 from dual
),
project_corrs_rb_miles (FED_ST_PRJ_NBR, CONT_ID, ROUTE_NBR, DIRECTION, CORRIDOR_RB, BEG_TERMINI, END_TERMINI) as (
  select 'HSIP-MT 1-6(76)384', '11713', 'N-1', 3, 'C000001E', 384.0, 386.6 from dual
  union all select 'MT-NH 1-3(71)255', '21216', 'N-1', 3, 'C000001N', 254.6, 256.0 from dual
  union all select 'HSIP-MT 1-6(76)384', '11713', 'N-1', 3, 'C000001E', 384.0, 386.6 from dual
  union all select 'HSIP-MT 1-6(76)384', '11713', 'N-1', 3, 'C000001E', 384.0, 386.6 from dual
  union all select 'MT-NH 1-3(71)255', '21216', 'N-1', 3, 'C000001E', 254.6, 256.0 from dual
  union all select 'MT-NH 1-3(71)255', '21216', 'N-1', 3, 'C000001E', 254.6, 256.0 from dual
  union all select 'HSIP 1-2(147)136', '07C11', 'N-1', 1, 'C000001N', null, null from dual
  union all select 'NH 1-1(93)16', '08913', 'N-1', 1, 'C000001N', 16.0, 44.8 from dual
  union all select 'CBI-NHTSA-MT 1-1(79)45', '03812', 'N-1', 1, 'C000001N', 44.9, 48.6 from dual
  union all select 'CBI-NHTSA-MT 1-1(79)45', '03812', 'N-1', 1, 'C000001N', 44.9, 48.6 from dual
)
select t.*, s.Corridor_RB as CorrectRb
from PROJECT_CORRS_RB_MILES t left join
corridors_grouped_tis s on 
s.TCR_CORRIDOR_ID = substr(t.Corridor_RB, 1, length(t.Corridor_RB) - 1)
where t.Direction <> 5
order by t.Corridor_RB
/

FED_ST_PRJ_NBR         CONT_ ROU  DIRECTION CORRIDOR BEG_TERMINI END_TERMINI CORRECTR
---------------------- ----- --- ---------- -------- ----------- ----------- --------
MT-NH 1-3(71)255       21216 N-1          3 C000001E       254.6         256 C000001E
HSIP-MT 1-6(76)384     11713 N-1          3 C000001E         384       386.6 C000001E
HSIP-MT 1-6(76)384     11713 N-1          3 C000001E         384       386.6 C000001E
MT-NH 1-3(71)255       21216 N-1          3 C000001E       254.6         256 C000001E
HSIP-MT 1-6(76)384     11713 N-1          3 C000001E         384       386.6 C000001E
MT-NH 1-3(71)255       21216 N-1          3 C000001N       254.6         256 C000001E
HSIP 1-2(147)136       07C11 N-1          1 C000001N                         C000001E
NH 1-1(93)16           08913 N-1          1 C000001N          16        44.8 C000001E
CBI-NHTSA-MT 1-1(79)45 03812 N-1          1 C000001N        44.9        48.6 C000001E
CBI-NHTSA-MT 1-1(79)45 03812 N-1          1 C000001N        44.9        48.6 C000001E

10 rows selected. 

Upvotes: 0

Related Questions