Oracle returns wrong values with LENGTH and INSTR

I'm aiming to retrieve the position of chars in a string, plus the length of a string.

The value of the notes field in the internal_notes table, for the row with ticket_id equal to 1679467247 is literally 'this is a test note'.

When I use the functions against the literal stirngs they work, but when I retrieve the info directly from the table column, the values are just wrong.

Any ideas as to what might be happening?

select notes,
       LENGTH(notes),
       INSTR(notes,' ')
FROM   internal_notes
where  ticket_id = 1679467247

union

select 'this is a test note',
       LENGTH('this is a test note'),
       INSTR('this is a test note',' ')
from   dual

This returns the following:

NOTES               LENGTH(NOTES) INSTR(NOTES,' ')
------------------- ------------- ----------------
this is a test note            32                11
this is a test note            19                5

Upvotes: 0

Views: 822

Answers (1)

Alex Poole
Alex Poole

Reputation: 191275

You can get this apparent inconsistency if you have zero-width characters in the value; for example:

create table internal_notes(ticket_id number, notes varchar2(32 char));
insert into internal_notes(ticket_id, notes)
values (1679467247, unistr('\200c\200cthis is a test note\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c\200c'));
insert into internal_notes(ticket_id, notes)
values (1679467248, unistr('\200c\200cthis is a test note'));
insert into internal_notes(ticket_id, notes)
values (1679467249, 'this is a test note');

select notes,
       LENGTH(notes),
       INSTR(notes,' ')
FROM   internal_notes
where  ticket_id = 1679467247;

NOTES                            LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌                         32               7

I said 'apparent inconsistency' because those numbers are correct; they just don't look it if you can't see some of the characters. Invisible characters still count.

As @MTO suggested you can use the dump() function to see exactly what is stored in the table, in decimal or hex representation, or mixed with 'normal' characters which can be a bit easier to interpret:

select notes,
       LENGTH(notes),
       INSTR(notes,' '),
       dump(notes, 1000) as dmp
FROM   internal_notes;

NOTES                            LENGTH(NOTES) INSTR(NOTES,'')
-------------------------------- ------------- ---------------
DMP                                                                                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
‌‌this is a test note‌‌‌‌‌‌‌‌‌‌‌                         32               7 
Typ=1 Len=58: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,8c,e2,80,
8c                                                                                                                                                                    

‌‌this is a test note                         21               7 
Typ=1 Len=25: e2,80,8c,e2,80,8c,t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e                                                                                                 

this is a test note                         19               5 
Typ=1 Len=19: t,h,i,s, ,i,s, ,a, ,t,e,s,t, ,n,o,t,e                                                                                                                   

db<>fiddle - though that is showing the zero-width characters as question marks, unlike SQL Developer and SQL*Plus.

Other zero-width characters are available (space, non-joiner, joiner), and you might see something different in your dump - it just has to be something your client doesn't display at all. Whatever is in there, if it affects all rows and not just that single ticket, then how and why is probably down to whatever front-end/application populates the table - possibly from a character set mismatch, but it could be intentional. If it is just that ticket then that note is an interesting test...

Upvotes: 4

Related Questions