user1865913
user1865913

Reputation:

SQL Oracle, get difference result IN and BETWEEN

could someone explain this case? example, i have a dump table with data like this:

TGL
19810909
19761026
19832529

when i execute with this query:

SELECT to_date(tgl,'YYYYMMDD') tgl
    FROM 
        (
        SELECT tgl  
        FROM tmpx
        WHERE 
        SUBSTR(tgl,5,2) BETWEEN '01' AND '12'
        AND length(tgl) = 8
        )
    WHERE to_date(tgl,'YYYYMMDD') < to_date('19811231','YYYYMMDD')

result: no error

TGL
09/09/1981
26/10/1976

but, when i execute with this query:

SELECT to_date(tgl,'YYYYMMDD') tgl
FROM 
    (
    SELECT tgl  
    FROM tmpx
    WHERE 
    SUBSTR(tgl,5,2) IN ('01','02','03','04','05','06','07','08','09','10','01','12')
    AND length(tgl) = 8
    )
WHERE to_date(tgl,'YYYYMMDD') < to_date('19811231','YYYYMMDD')

result: error

ORA-01843: not a valid month

why the row number third (19832529) include in selection that causes an error? whereas if I execute the following query:

SELECT tgl  
FROM tmpx
WHERE 
SUBSTR(tgl,5,2) IN ('01','02','03','04','05','06','07','08','09','10','11','12')
AND length(tgl) = 8

the result is like this (wihtout row number 3)

TGL
19810909
19761026

thank you.

Upvotes: 0

Views: 56

Answers (1)

Alex Poole
Alex Poole

Reputation: 191570

If you look at the execution plans for both queries you can see how they are being handled by the optimiser. For the first one:

--------------------------------------------------------------------------                                              
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |                                              
--------------------------------------------------------------------------                                              
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |                                              
|*  1 |  TABLE ACCESS FULL| TMPX |     1 |     6 |     3   (0)| 00:00:01 |                                              
--------------------------------------------------------------------------                                              

Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     

   1 - filter(LENGTH("TGL")=8 AND SUBSTR("TGL",5,2)>='01' AND                                                           
              SUBSTR("TGL",5,2)<='12' AND TO_DATE("TGL",'YYYYMMDD')<TO_DATE('                                           
              1981-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))   

And for the second:

--------------------------------------------------------------------------                                              
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |                                              
--------------------------------------------------------------------------                                              
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |                                              
|*  1 |  TABLE ACCESS FULL| TMPX |     1 |     6 |     3   (0)| 00:00:01 |                                              
--------------------------------------------------------------------------                                              

Predicate Information (identified by operation id):                                                                     
---------------------------------------------------                                                                     

   1 - filter(LENGTH("TGL")=8 AND TO_DATE("TGL",'YYYYMMDD')<TO_DATE('                                                   
              1981-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND                                                       
              (SUBSTR("TGL",5,2)='01' OR SUBSTR("TGL",5,2)='02' OR                                                      
              SUBSTR("TGL",5,2)='03' OR SUBSTR("TGL",5,2)='04' OR                                                       
              SUBSTR("TGL",5,2)='05' OR SUBSTR("TGL",5,2)='06' OR                                                       
              SUBSTR("TGL",5,2)='07' OR SUBSTR("TGL",5,2)='08' OR                                                       
              SUBSTR("TGL",5,2)='09' OR SUBSTR("TGL",5,2)='10' OR                                                       
              SUBSTR("TGL",5,2)='12'))         

Notice the order that the filters are applied. In the first one it's looking at the substring first, and only the values that pass that filter will then be converted to a date for the 1998 comparison.

In the second one the date check is being done first, so it tries to convert the invalid value before it filters it out.

The real problem here is storing dates as string, which allows invalid data to be entered. If you're stuck with that then another approach is to use a function to attempt to convert the string to a date and ignore the error(s) thrown, which still isn't ideal but would ignore the same values you already are. There are lots of examples of this, including this one of mine. With something like that you could do:

SELECT safe_to_date(tgl) tgl
FROM tmpx
WHERE safe_to_date(tgl) < date '1981-12-31';

or if you prefer:

SELECT tgl
FROM (
  SELECT safe_to_date(tgl) tgl
  FROM tmpx
)
WHERE tgl < date '1981-12-31';

Your function could only look for YYYYMMDD format strings, or you could pass in the format you want to check, if you don't want it to be flexible.

Upvotes: 1

Related Questions