Reputation: 69
I've spent about 18 hours of trying different things and searching around now, finally I give up and have to ask you guys.
Backstory: I am finally migrating a old MS Access database to MySQL (version 5.6.16-log).
Problem: Some Unicode text in the Access database contain four bytes (UTF-8).
MySQL still has a problem with inserting four bytes UTF-8 characters. This problem is getting old and I was surprised to discover it's not fixed yet: http://bugs.mysql.com/bug.php?id=67297
I'm using "MySQL ODBC 5.3 Unicode Driver" to transfer data between databases (the latest beta development release). No matter what I try the process ends up freezing when I try to insert the string with 4 byte UTF8 characters (the thread uses 100% CPU forever). Have tried all workarounds suggested everywhere on the Internet, nothing works.
Now I will just accept the limitations of MySQL: I can't store all Unicode characters.
So I want to remove all 4 byte UTF8 characters from the text before I insert it into the database. But I can't for the life of me find a way to do it in classic ASP.
Can anybody help?
(I can't not use ASP btw, there is way too much code to rewrite it in a different language. Just changing databases is a remarkable feat; there are several of them and it will take days to complete.)
Edit: A solution in JScript is also acceptable, since it can be run from ASP pages.
Upvotes: 2
Views: 4202
Reputation: 3111
This should work:
Function UTF8Filter(strString)
On Error Resume Next
For i = 1 to Len(strString)
charCode = AscW(Mid(strString, i, 1))
If charCode > 32 AND charCode <= 127 then ' here was OR
'Append valid character'
strString = Mid(strString, i, 1)
End If
Next
UTF8Filter = strString
On Error Goto 0
End Function
Updated function:
Function Remove4ByteUTF8(strString)
Set objRegEx = CreateObject("VBScript.RegExp")
objRegEx.Global = True
objRegEx.IgnoreCase = True
objRegEx.Pattern = "/[\xF0-\xF7].../s"
Remove4ByteUTF8 = objRegEx.Replace(strString, "")
End Function
Upvotes: 2