Aleksej
Aleksej

Reputation: 22949

Oracle convert accented characters in html format

I have a varchar2 field containing some text, including accented characters ( e.g. à,è,...) and other non literal characters (<,!,;...).

I need to transform this field in html compliant format, by using one of the following output formats:

input     output1     output2
    á     &#225;      &aacute;  
    â     &#226;      &acirc;

Here is an example of starting data and two possible needed output, with something I unsuccessfully tried:

with test(starting, needed, needed2) as (
    select 'abc À. < ! ; à ',
           'abc &Agrave;. &lt; ! ; &agrave;',
           'abc &#192;. &#60; ! ; &#224;'
    from dual)
select starting, needed, needed2, UTL_I18N.escape_reference(starting) as result, 'UTL_I18N' as function from test union all
select starting, needed, needed2, convert(starting, 'US7ASCII' ) ,               'convert' from test union all
select starting, needed, needed2, HTF.ESCAPE_SC(starting) ,                      'htf' from test union all
select starting, needed, needed2, asciiStr(starting) ,                           'ascii' from test union all
select starting, needed, needed2, dbms_xmlgen.convert(starting) ,                'dbms_xmlgen' from test union all
select starting, needed, needed2, TRANSLATE(starting USING CHAR_CS) ,            'translate' from test;

The output (needed and obtained so far):

STARTING             NEEDED                                   NEEDED2                                  RESULT                         FUNCTION
-------------------- ---------------------------------------- ---------------------------------------- ------------------------------ -----------
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc À. &lt; ! ; à              UTL_I18N
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc ?. < ! ; ?                 convert
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc À. &lt; ! ; à              htf
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc \00B7. < ! ; \2026         ascii
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc À. &lt; ! ; à              dbms_xmlgen
abc À. < ! ; à       abc &Agrave;. &lt; ! ; &agrave;          abc &#192;. &#60; ! ; &#224;             abc À. < ! ; à                 translate

The resulting string is used to build a file compliant with Excel (even Excel 2003), by applying a transformation on a XML; this does not support accented characters, so I need a conversion.

I could use some regular expressions, but I was trying to get a better solution. I'm using Oracle 11.2.0.3.0.

Upvotes: 4

Views: 3423

Answers (2)

Wernfried Domscheit
Wernfried Domscheit

Reputation: 59436

I think you have to start with ASCIISTR function. For non-Unicode characters it returns the UTF-16 codepoint in Hex format, e.g. ´00B7´

Then you need a converter to transform Hex numbers into decimal numbers, for example this one:

FUNCTION Base2Dec(BaseString IN VARCHAR2, Base IN INTEGER DEFAULT 16) 
    RETURN INTEGER DETERMINISTIC IS

    BaseNumber INTEGER := 0;
    HexString CONSTANT CHAR(16) := '0123456789ABCDEF';    
BEGIN
    IF Base > 16 THEN 
        RAISE NUMERIC_OVERFLOW;
    END IF;
    IF BaseString IS NULL THEN
        RETURN NULL;
    ELSE
        FOR i IN 1..LENGTH(BaseString) LOOP
            BaseNumber := BaseNumber * Base + INSTR(HexString, UPPER(SUBSTR(BaseString, i, 1))) - 1;
        END LOOP;    
        RETURN BaseNumber;
    END IF;
END Base2Dec;

Based on this you can compose your escaped strings.

Upvotes: 1

Nefreo
Nefreo

Reputation: 2182

As far as I know there's no built-in function to change accented characters into html entities.

But you could use a function like this one:

create or replace function to_htmlentity(text varchar2)
return varchar2 deterministic is
    result varchar(200):='';
begin 
    for letter in (
        select 
        substr(text,level,1) as character,
        ascii(substr(text,level,1)) as ascii 
        from dual connect by level <= length(text)
    )loop
        if letter.ascii > 128 
        or letter.ascii in (34,38,60,62) --",&,<,> 
        then
            result := result||'&#'||letter.ascii||';';
        else
            result := result||letter.character;
        end if;
    end loop;
    return result;
end;

This function takes any non-ascii(128) character and changes into it's respective html entity. I find it more clear than using a RegEx.

Then you can use it as follows:

with test(starting, needed, needed2) as (
    select 'abc À. < ! ; à',
           'abc &Agrave;. &lt; ! ; &agrave;',
           'abc &#192;. &#60; ! ; &#224;'
    from dual)
select starting, needed, needed2, to_htmlentity(starting) as result from test

Upvotes: 1

Related Questions