Reputation: 129
I have a (postgres) database that cannot accept unicode characters, but they are input as unicode from coldfusion. I convert them to ascii as shown here and store them in the database. That works fine and here is the code I use to convert someones first name (containing Chinese/Korean characters etc) into ascii.
<cfset strLen = len(#URL.firstName#)>
<cfset tempCharAll = 'START_TAG'>
<cfloop from="1" to="#strLen#" index="i">
<cfset current_char = mid(#URL.firstName#,i,1)>
<cfset tempChar = formatBaseN(asc(current_char),16)>
<cfset tempCharAll = tempCharAll & tempChar >
</cfloop>
<cfset #URL.lastName# = #tempCharAll#>
<cfset #URL.firstName# = #tempCharAll#>
Now how do I reverse this and make coldfusion convert something back to unicode so the correct Korean/Chinese characters display when some logs in etc? Thanks.
This code doesn't work:
If I use this code:
<CFOUTPUT> input:</br></br></CFOUTPUT>
<cfset tempChar = "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset tempChar = formatBaseN(asc(current_char),16)>
<CFOUTPUT> encoded:</br></br></CFOUTPUT>
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset varCoded = CharsetDecode(tempChar, "windows-1252")>
<cfset strUnEncoded = CharsetEncode(varCoded, "utf-8")>
<CFOUTPUT> decoded:</br></br></CFOUTPUT>
<CFOUTPUT> #strUnEncoded#</br></br></CFOUTPUT>
Then it outputs 74 for both decoded and encoded when it should output "t" for decoded
Upvotes: 2
Views: 2812
Reputation: 1598
Update to this answer for CF10 / Railo4.x - there's a new function Canonicalize() that nicely converts ascii characters to UTF-8 for output.
Example usage:
#Canonicalize('h\u00E9',1,1)#
You can also use it in CF8 and 9 as described here
Upvotes: 0
Reputation: 315
I am not an encoding expert at all, but I can see you're formatting to base N but not decoding from base N. You also need to get the character using chr() in the last line
<CFOUTPUT> input:</br></br></CFOUTPUT>
<cfset tempChar = "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset tempChar = formatBaseN(asc(tempChar),16)>
<CFOUTPUT> encoded:</br></br></CFOUTPUT>
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset varCoded = CharsetDecode(tempChar, "windows-1252")>
<cfset strUnEncoded = InputBaseN(CharsetEncode(varCoded, "utf-8"),16)>
<CFOUTPUT> decoded:</br></br></CFOUTPUT>
<CFOUTPUT> #chr(strUnEncoded)#</br></br></CFOUTPUT>
It seems like this could be simplified to the following, but like I said, I'm not all that fimiliar with character encoding
<CFOUTPUT> input:</br></br></CFOUTPUT>
<cfset tempChar = "t">
<CFOUTPUT> #tempChar#</br></br></CFOUTPUT>
<cfset strUnEncoded = asc(tempChar)>
<CFOUTPUT> decoded:</br></br></CFOUTPUT>
<CFOUTPUT> #chr(strUnEncoded)#</br></br></CFOUTPUT>
Upvotes: 1
Reputation: 2982
Try:
<cfset varCoded = CharsetDecode(yourString.stringColumn, "windows-1252")>
<cfset strUnEncoded = CharsetEncode(varCoded, "utf-8")>
Upvotes: 0