Reputation: 22949
I have a varchar2
field containing some text, including accented characters ( e.g. à
,è
,...) and other non literal characters (<
,!
,;
...).
I need to transform this field in html compliant format, by using one of the following output formats:
input output1 output2
á á á
â â â
Here is an example of starting data and two possible needed output, with something I unsuccessfully tried:
with test(starting, needed, needed2) as (
select 'abc À. < ! ; à ',
'abc À. < ! ; à',
'abc À. < ! ; à'
from dual)
select starting, needed, needed2, UTL_I18N.escape_reference(starting) as result, 'UTL_I18N' as function from test union all
select starting, needed, needed2, convert(starting, 'US7ASCII' ) , 'convert' from test union all
select starting, needed, needed2, HTF.ESCAPE_SC(starting) , 'htf' from test union all
select starting, needed, needed2, asciiStr(starting) , 'ascii' from test union all
select starting, needed, needed2, dbms_xmlgen.convert(starting) , 'dbms_xmlgen' from test union all
select starting, needed, needed2, TRANSLATE(starting USING CHAR_CS) , 'translate' from test;
The output (needed and obtained so far):
STARTING NEEDED NEEDED2 RESULT FUNCTION
-------------------- ---------------------------------------- ---------------------------------------- ------------------------------ -----------
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à UTL_I18N
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc ?. < ! ; ? convert
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à htf
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc \00B7. < ! ; \2026 ascii
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à dbms_xmlgen
abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à abc À. < ! ; à translate
The resulting string is used to build a file compliant with Excel (even Excel 2003), by applying a transformation on a XML; this does not support accented characters, so I need a conversion.
I could use some regular expressions, but I was trying to get a better solution. I'm using Oracle 11.2.0.3.0.
Upvotes: 4
Views: 3423
Reputation: 59436
I think you have to start with ASCIISTR
function.
For non-Unicode characters it returns the UTF-16 codepoint in Hex format, e.g. ´00B7´
Then you need a converter to transform Hex numbers into decimal numbers, for example this one:
FUNCTION Base2Dec(BaseString IN VARCHAR2, Base IN INTEGER DEFAULT 16)
RETURN INTEGER DETERMINISTIC IS
BaseNumber INTEGER := 0;
HexString CONSTANT CHAR(16) := '0123456789ABCDEF';
BEGIN
IF Base > 16 THEN
RAISE NUMERIC_OVERFLOW;
END IF;
IF BaseString IS NULL THEN
RETURN NULL;
ELSE
FOR i IN 1..LENGTH(BaseString) LOOP
BaseNumber := BaseNumber * Base + INSTR(HexString, UPPER(SUBSTR(BaseString, i, 1))) - 1;
END LOOP;
RETURN BaseNumber;
END IF;
END Base2Dec;
Based on this you can compose your escaped strings.
Upvotes: 1
Reputation: 2182
As far as I know there's no built-in function to change accented characters into html entities.
But you could use a function like this one:
create or replace function to_htmlentity(text varchar2)
return varchar2 deterministic is
result varchar(200):='';
begin
for letter in (
select
substr(text,level,1) as character,
ascii(substr(text,level,1)) as ascii
from dual connect by level <= length(text)
)loop
if letter.ascii > 128
or letter.ascii in (34,38,60,62) --",&,<,>
then
result := result||'&#'||letter.ascii||';';
else
result := result||letter.character;
end if;
end loop;
return result;
end;
This function takes any non-ascii(128) character and changes into it's respective html entity. I find it more clear than using a RegEx.
Then you can use it as follows:
with test(starting, needed, needed2) as (
select 'abc À. < ! ; à',
'abc À. < ! ; à',
'abc À. < ! ; à'
from dual)
select starting, needed, needed2, to_htmlentity(starting) as result from test
Upvotes: 1