Void
Void

Reputation: 63

Facebook charset detection mechanism?

Today, I have looked into HTML code of facebook.com, and found something like this:

<input type="hidden" value="€,´,€,´,水,Д,Є" name="charset_test"/>

It's repeated two times inside the <form>...</form>.

Any idea what this code might be useful for - some kind of server-side client charset detection? As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

Upvotes: 6

Views: 3808

Answers (4)

dan04
dan04

Reputation: 91179

Any idea what this code might be useful for - some kind of server-side client charset detection?

Apparently so.

The Euro sign is useful for charset detection because there are so many ways of encoding it:

  • E2 82 AC in UTF-8
  • 88 in windows-1251
  • 80 in the other windows-125x encodings
  • A4 in ISO-8859-7, -15, and -16
  • A2 E3 in GB18030
  • 85 40 in Shift-JIS
  • etc.

As far as I know, browser charset is being transmitted in HTTP request anyway (an "Accept-Charset" header).

It's supposed to transmitted in the HTTP Content-Type header, but that doesn't mean that user agents actually get it right.

Upvotes: 4

troelskn
troelskn

Reputation: 117567

As Pekka says, this is to be able to detect the request charset. The HTTP protocol doesn't provide a way to specify the charset of a request. Because of this, one has to rely on conventions outside of the protocol. Generally browsers are predictable, but this trick is the only way to be 100% sure.

See also: http://www.phpwact.org/php/i18n/charsets

Upvotes: 0

YOU
YOU

Reputation: 123881

&euro;,&acute;,€,´,水,Д,Є

I guess some browser send &euro; same as and &acute; same as ´,

So they can check like charset_test[0] == charset_test[2] and charset_test[1] == charset_test[3]

For others other characters, I have no clue. 水 probably test for CJK.

Upvotes: 0

Pekka
Pekka

Reputation: 449613

I guess they are matching this in the receiving script to make sure the client sent the request properly encoded as UTF-8 and maybe even, because they know what characters to expect, to detect the actual encoding on the fly.

If I remember correctly - I had to deal with it once - there have been problems with form encoding in IE6 in some situations.

Upvotes: 3

Related Questions