Reputation: 1300

How to add a UNICODE Character in a SQL Query

I am trying within others to add the UNICODE \u0099 in a CLOB. In my case I have broken leftovers of unicodes e.g (/0099) in my INDICATION column which if it matters is a CLOB, so I want to replace them with each appropriate character. So I am using the REPLACE to find the broken code and replace it with the proper one.

UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0099', '™');

I cannot find anywhere online though how can I get the unicode and use it in a query. In this case the correct way to do that would replace the '™'. To be more specific my cases are: 00AD 2122 0092 0095 0097 0096 0094. Only the first two 00AD and 2122 can be solved using UNISTR.

Upvotes: 1

Answers (3)

Sergiusz Wolicki

Reputation: 61

As I am not 100% sure what the actual problem is, let me give an over-exhaustive answer.

I see two aspects in this problem. The first is how to identify characters that should be put in place of the broken Unicode escape sequence. Here, the problem is that the broken escape sequences are not for Unicode characters. In Unicode, the listed codes (\0099, etc.) are upper range ISO Latin 1 control codes, most certainly not what you want. However, if interpreted in most single-byte Windows code pages, these codes have the meaning that you have identified yourself. The only code that is both Unicode and Windows is \00AD, which soft hyphen (though, the Windows code does not have the leading zero). This identification exercise needs to be manual, unless you know the original encoding of the characters, in which case some automation can be programmed.

The second aspect is how to enter the replacement character into the UPDATE statement. Here the problem is that any SQL statement needs to be represented first in the client character set and then in the database character set. If you enter a statement in SQL Developer, the client character set is Unicode and you can enter any printable character directly into the string literal. Non-printable characters need to be encoded and UNISTR is a convenient way to do this. The CHR() function is another possibility. Here, you need to specify the code of the character in the database character set.

Now to the database character set. As you store your data in CLOB, anything that you want to store there needs to be representable in the database character set. Hence, either the database character set can represent all relevant characters and you can use them in literals verbatim or you cannot store these characters in CLOB at all.

Now, if your database character set is one of the Windows code pages, like WE8MSWIN1252 or EL8MSWIN1253, and you know the broken escape sequences denote codes in this code page, you might try PL/SQL coding to translate codes automatically. I have not found a way to do this without PL/SQL as you cannot do re-coding processing for all broken escapes in a CLOB value using a pure SQL expression. You could do this for a single escape only.

In PL/SQL, you would iterate with INSTR or REGEXP_INSTR over each value looking for the escape syntax, retrieve the text of the numeric code, convert this text to the actual numeric code and pass to CHR() to write to the result CLOB. If working of CLOBs with arbitrary length (>32767), you would need to use a temporary CLOB for the result. For shorter CLOBs, you can work using PL/SQL VARCHAR2(32767) much easier. Still, it seems like a lot of work. With a limited number of possible broken escape sequences, a simplified solution you found may be less work.

Upvotes: 0

Sir. Hedgehog

Reputation: 1300

The best way I found in order to solve this problem was, while iterating over the tables of ASCII and using the actual symbol for every unicode needed, to find each respective character in the ASCII table and use the HEX code. If you have the HEX code you can then use that in UNISTR('code') to get your respective result in SQL. In my case that would be:

UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0099', UNISTR('\2122'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0092', UNISTR('\0027'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0095', UNISTR('\2022'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0097', UNISTR('\2014'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0096', UNISTR('\2013'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\0094', UNISTR('\0022'));
UPDATE PRODUCT SET INDICATION = REPLACE(INDICATION, '\00AD', UNISTR('\00AD'));

However I wont accept this as the correct answer, because I dont think it is the best case scenario. I cannot accept that there is no better way to provide the UNICODE code and get the respective result in SQL. Thus I will leave this open for someone that has the appropriate answer in the future.

Upvotes: 2

psaraj12

Reputation: 5072

Use rawtohex to find out how oracle stores '™' internally in database

 select rawtohex('™') from dual;

Upvotes: 1

How to add a UNICODE Character in a SQL Query

Answers (3)

Related Questions