michael
michael

Reputation:

Classic ASP gremlims, getting a  inserted into text whenever an HTML special character is used

I'm working on an older classic ASP site, and there's a form that allows the user to enter some text (into a multiline textbox), and if they add an html character like ® (register trademark) it inserts it correctly. But when they go to edit the data, using the same form, the update will add a random 'Â' (circumflex accent) in front of the registered trademark. The content type is utf-8.

Any ideas?

Thanks for any time you give this. It's been driving me nuts. -m

Upvotes: 5

Views: 15542

Answers (4)

James Curran
James Curran

Reputation: 103485

I'm gonna guess that the editor you are using doesn't work with UTF-8, and is converting everything to ASCII.

The simple answer is to stop using special characters in HTML pages. The copyright symbol should be written as © or ©.

Upvotes: 2

AnthonyWJones
AnthonyWJones

Reputation: 189437

The fundemental problem is the impact of Response.Codepage on Form Posts.

When you send a form to a client specifying that the content is encoded as UTF-8, the browser will assume that the content of form posts should be sent encoded as UTF-8.

Now the action page that receives the post will (somewhat counter-intuatively) use the value of Response.Codepage to inform it how the characters in the post are encoded. This isn't obvious because we tend to think its the job of the sender to define the encoding of what its sending. Also it isn't a natural leap to think that a property to do with the encoding of what we want to send in our response would have anything to do with how the initial a request is received. In this case it does.

Whats happening is your form is posting a UTF-8 encoded version of the character but the page that receives does not have its Response.Codepage set to 65001 (the UTF-8 codepage). Its probably set to the systems OEM codepage like 1252. Hence the UTF-8 encoding for the character gets interpreted as two individual characters.

My recommendations for good character handling in ASP are:-

  • Save all pages as UTF-8
  • Include <%@ codepage=65001 at the top of all pages
  • Include <% Response.CharSet = "UTF-8" %> at the top all pages
  • Store posted data in a unicode field type such as SQL Servers NVARCHAR type.

The important thing here is that before you read form values in an ASP page you need to make sure that the Response.Codepage is set to a codepage that matches the senders encoding and this doesn't happen automatically.

Upvotes: 12

mercator
mercator

Reputation: 28656

® is what ® looks like if it's stored as UTF-8, but displayed as ASCII/ISO-8859-1/Windows-1252. Using the meta tag is not enough to make sure your page is being served as UTF-8. You will also need to set the encoding in the Content-Type HTTP header. This header is typically set either with some server-wide setting or programatically.

I don't know ASP, but this seems to be how you should set that header:

HtmlEncode UTF-8

And this might provide some more information:

http://technet.microsoft.com/en-us/library/bb742422.aspx#EBAA

If your data is stored in a database, you'll also need to make sure the data is either stored in UTF-8 as well, or converted when storing and retrieving it.

Upvotes: 0

JasonMichael
JasonMichael

Reputation: 2581

From my experience with this exact problem, I found that these characters popped up alot because 1) The user was using a non-English character set (and keyboard) when the content was entered (i.e. Spanish), and 2) The content was not converted to UTF-8. You're on the right track, checking the content type in the header, but you really have to run the content through a converter, as well, if this keeps happening. This problem caused me hours of pain, many years ago, with Classic ASP (I wish I still had access to the code to be of further help).

Upvotes: 1

Related Questions