Thor Russell
Thor Russell

Reputation: 129

Converting ascii characters back to uni-code in Coldfusion

I have a (postgres) database that cannot accept unicode characters, but they are input as unicode from coldfusion. I convert them to ascii as shown here and store them in the database. That works fine and here is the code I use to convert someones first name (containing Chinese/Korean characters etc) into ascii.

<cfset strLen = len(#URL.firstName#)>
<cfset tempCharAll = 'START_TAG'>
<cfloop from="1" to="#strLen#" index="i">
<cfset current_char = mid(#URL.firstName#,i,1)>
<cfset tempChar =  formatBaseN(asc(current_char),16)>
<cfset tempCharAll = tempCharAll & tempChar >
</cfloop>
<cfset #URL.lastName# = #tempCharAll#>
<cfset #URL.firstName# = #tempCharAll#>

Now how do I reverse this and make coldfusion convert something back to unicode so the correct Korean/Chinese characters display when some logs in etc? Thanks.

This code doesn't work:

If I use this code:

<CFOUTPUT> input:</br></br></CFOUTPUT> 
<cfset tempChar =  "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset tempChar =  formatBaseN(asc(current_char),16)>
<CFOUTPUT> encoded:</br></br></CFOUTPUT> 
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset varCoded = CharsetDecode(tempChar, "windows-1252")>
<cfset strUnEncoded = CharsetEncode(varCoded, "utf-8")> 
<CFOUTPUT> decoded:</br></br></CFOUTPUT> 
<CFOUTPUT> #strUnEncoded#</br></br></CFOUTPUT> 

Then it outputs 74 for both decoded and encoded when it should output "t" for decoded

Upvotes: 2

Views: 2812

Answers (3)

daamsie
daamsie

Reputation: 1598

Update to this answer for CF10 / Railo4.x - there's a new function Canonicalize() that nicely converts ascii characters to UTF-8 for output.

Example usage:

#Canonicalize('h\u00E9',1,1)#

You can also use it in CF8 and 9 as described here

Upvotes: 0

Dominic O&#39;Connor
Dominic O&#39;Connor

Reputation: 315

I am not an encoding expert at all, but I can see you're formatting to base N but not decoding from base N. You also need to get the character using chr() in the last line

<CFOUTPUT> input:</br></br></CFOUTPUT> 
<cfset tempChar =  "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset tempChar =  formatBaseN(asc(tempChar),16)>
<CFOUTPUT> encoded:</br></br></CFOUTPUT> 
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset varCoded = CharsetDecode(tempChar, "windows-1252")>
<cfset strUnEncoded = InputBaseN(CharsetEncode(varCoded, "utf-8"),16)> 
<CFOUTPUT> decoded:</br></br></CFOUTPUT> 
<CFOUTPUT> #chr(strUnEncoded)#</br></br></CFOUTPUT> 

It seems like this could be simplified to the following, but like I said, I'm not all that fimiliar with character encoding

<CFOUTPUT> input:</br></br></CFOUTPUT> 
<cfset tempChar =  "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset strUnEncoded =  asc(tempChar)>
<CFOUTPUT> decoded:</br></br></CFOUTPUT> 
<CFOUTPUT> #chr(strUnEncoded)#</br></br></CFOUTPUT> 

Upvotes: 1

Fergus
Fergus

Reputation: 2982

Try:

<cfset varCoded = CharsetDecode(yourString.stringColumn, "windows-1252")>
<cfset strUnEncoded = CharsetEncode(varCoded, "utf-8")>

Upvotes: 0

Related Questions