Waqas
Waqas

Reputation: 1

how to query in oracle where string like 'XXX-XX' i.e. string contains (hyphen)

  1. select caseid, indcase from casemt where indcase like '%19663-01%'
  2. select caseid, indcase from casemt where indcase like '%19663%'

Query1 returns 0 records. while query2 returns multiple records where one of the record's indcase is equal to 19663-01

so the question is why query1 is returning 0 records despite the fact that indcase = 19663-01 exist in oracle table

Upvotes: 0

Views: 1237

Answers (2)

MT0
MT0

Reputation: 167972

You can debug your data:

SELECT caseid,
       indcase,
       DUMP(indcase) AS actual_binary,
       DUMP('19663-01') AS expected_binary
FROM   casemt
WHERE  indcase like '%19663_01%'

Which, for the sample data:

CREATE TABLE casemt (caseid, indcase) AS
SELECT 1, '19663-01' FROM DUAL UNION ALL
SELECT 2, CAST(UNISTR('19663\201001') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 3, CAST(UNISTR('19663\201201') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 4, CAST(UNISTR('19663\201301') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 5, CAST(UNISTR('19663\201401') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 6, CAST(UNISTR('19663\204301') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 7, CAST(UNISTR('19663\FE6301') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 8, CAST(UNISTR('19663\FF0D01') AS VARCHAR2(10)) FROM DUAL UNION ALL
SELECT 9, CAST(UNISTR('19663\180601') AS VARCHAR2(10)) FROM DUAL;

Outputs:

CASEID INDCASE ACTUAL_BINARY EXPECTED_BINARY
1 19663-01 Typ=1 Len=8: 49,57,54,54,51,45,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
2 19663‐01 Typ=1 Len=10: 49,57,54,54,51,226,128,144,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
3 19663‒01 Typ=1 Len=10: 49,57,54,54,51,226,128,146,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
4 19663–01 Typ=1 Len=10: 49,57,54,54,51,226,128,147,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
5 19663—01 Typ=1 Len=10: 49,57,54,54,51,226,128,148,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
6 19663⁃01 Typ=1 Len=10: 49,57,54,54,51,226,129,131,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
7 19663﹣01 Typ=1 Len=10: 49,57,54,54,51,239,185,163,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
8 19663-01 Typ=1 Len=10: 49,57,54,54,51,239,188,141,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49
9 19663᠆01 Typ=1 Len=10: 49,57,54,54,51,225,160,134,48,49 Typ=96 Len=8: 49,57,54,54,51,45,48,49

Which, shows that there are multiple different hyphen characters:

  • the ACSII character with the character code 45; and
  • the characters in the extended character sets with the UTF-8 (hex) codes of:
    • U+1806 (Mongolian Todo Soft Hyphen)
    • U+2010 (Hyphen)
    • U+2012 (Figure Dash)
    • U+2013 (En Dash)
    • U+2014 (Em Dash)
    • U+2043 (Hyphen Bullet)
    • U+FE63 (Small Hyphen-Minus)
    • U+FF0D (Fullwidth Hyphen-Minus)

And your data could be using any of them for a hyphen character.


If you want to match them all then you can use:

SELECT caseid,
       indcase
FROM   casemt
WHERE  TRANSLATE(
         indcase,
         UNISTR('\1806\2010\2012\2013\2014\2043\FE63\FF0D'),
         '--------'
       ) like '%19663-01%'

Or you can update your data to try to normalise it to a single hyphen type:

UPDATE casemt
SET   indcase  = TRANSLATE(
                   indcase,
                   UNISTR('\1806\2010\2012\2013\2014\2043\FE63\FF0D'),
                   '--------'
                 )
WHERE indcase <> TRANSLATE(
                   indcase,
                   UNISTR('\1806\2010\2012\2013\2014\2043\FE63\FF0D'),
                   '--------'
                 );

and then your hyphens will be the ASCII hyphen and not a hyphen from the extended character set and then you can use:

SELECT caseid,
       indcase
FROM   casemt
WHERE  indcase LIKE '%19663-01%'

db<>fiddle here

Upvotes: 1

Gnqz
Gnqz

Reputation: 3382

Can you copy the '%19663-01%' exactly as it is in the query and check the result of the following:

SELECT DUMP('%19663-01%',16)
  FROM dual;

It should return this (considering your base is UTF8 encoded):

Typ=96 Len=10: 25,31,39,36,36,33,2d,30,31,25

If the result is different you just have copied something with other encoding that transforms into some character that has no visual representation in your encoding.

Upvotes: 0

Related Questions