Reputation: 21
While calling XMLTYPE() on a CLOB column which should contain a valid XML1.0 xml (the db encoding should be UTF-8), the following error message comes out (I am from Italy):
ORA-31011: Analisi XML non riuscita
ORA-19202: Errore durante l'elaborazione XML
LPX-00217: carattere non valido 15577023 (U+EDAFBF)
Error at line 240
ORA-06512: a "SYS.XMLTYPE", line 272
ORA-06512: a line 1
31011. 00000 - "XML parsing failed"
*Cause: XML parser returned an error while trying to parse the document.
*Action: Check if the document to be parsed is valid.
Now this invalid character is given as Unicode codepoint EDAFBF. The problem is that according to Unicode spec (wikipedia), there are no codepoints beyond 10FFFF. So what could this error mean?
Inspecting this CLOB with SQLDeveloper (and copying it to Notepad++ with encoding set to utf-8) does not reveal anything unusual beyond some strange characters which apparently came from the user browser when he copied text from a Microsoft Word document (but the CLOB, at least as copied from SQLDeveloper UI and exhibited by Notepad++ with UTF-8 encoding, seems to be a valid UTF-8 text).
Is there a way to reproduce this error populating Oracle directly (from SQLDeveloper or in some other way)? (contacting the end user to understand what he put exactly in the web form is problematic)
Upvotes: 2
Views: 3865
Reputation: 191235
Not addressing the first part of the question, but you can reproduce it with a RAW value:
select xmltype('<dummy>'
|| utl_raw.cast_to_varchar2(cast('EDAFBF' as raw(6)))
|| '</dummy>')
from dual;
Error report -
SQL Error: ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00217: invalid character 15577023 (U+EDAFBF)
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 310
ORA-06512: at line 1
Just selecting the character:
select utl_raw.cast_to_varchar2(cast('EDAFBF' as raw(6)))
from dual;
... is displayed as a small square with an even smaller question mark inside it (I think) in SQL Developer for me (version 4.1), but that's just how it's choosing to render that; copying and pasting still gives the replacement character � since the codepoint is, as you say, invalid. XMLType is being stricter about the validity than CLOB. The unistr()
function doesn't handle the value either, which isn't really a surprise.
(You don't need to cast the string to raw(6)
, just utl_raw.cast_to_varchar2('EDAFBF')
has the same effect; but doing it explicitly makes it a bit clearer what's going on, I think).
I don't see how that could have got into your file without some kind of corruption, possibly through a botched character set conversion I suppose. You could maybe use dbms_lob.replace_fragment()
or similar to replace or remove that character, but of course there may be others you haven't hit yet, and at best you'd only be treating the symptoms rather than the cause.
Upvotes: 7