Jeroen Wiert Pluimers
Jeroen Wiert Pluimers

Reputation: 24523

Which encoding failure did encode "vóór" into "v3/43/4r"?

A while ago, I saw the text "v3/43/4r" in a document.

I know it comes from "vóór" (the acute accent emphasises in Dutch), and wonder which encoding failure was applied to get this wrong.

Upvotes: 3

Views: 153

Answers (1)

rodrigo
rodrigo

Reputation: 98496

Some time ago I've written a program that semi-automatically makes this analysis (maybe I'll publish it some time...) and here it is the result, with a bit of imagination:

  • ó: is U+00F3, and occupies the same codepoint (0xF3) in a lot of different encodings (most ISO-8859-* and most western Windows-*).
  • In CP850 the codepint 0xF3 is ¾ (U+00BE), that is the three-quarters character. It is the same in other, less used, codepages (CP775, CP856, CP857, CP858).
  • The ¾ is sometimes transliterated to 3/4 when the character is not directly available.

And there you are! "vóór" -> "v¾¾r" -> "v3/43/4r".

The first part (ó -> ¾) is the usual corruption of ANSI vs. OEM codepages in the Western Windows versions (in my country ANSI=Windows-1252, OEM=CP850). You can see it easily creating a file with NOTEPAD, writing vóór and dumping it in a command prompt with type.

Upvotes: 4

Related Questions