Reputation: 1284
For a customer project, a query is made against a DB and the results are written to a file. The file is required to be in Shift JIS as it is later used as input for another legacy system. The Wikipedia article indicates that:
The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign (U+00A5) at 0x5C and an overline (U+203E) at 0x7E in place of the ASCII character set's backslash and tilde respectively.
During some testing, I have verified that while the yen sign (U+00A5) properly becomes 0x5C, the overline (U+203E) becomes 0x3F (question mark) rather than the expected 0x7E.
While I am doing normal output with a StreamWriter to a file, below is minimal code to reproduce:
static void Test()
{
// Get Shift-JIS encoder.
var encoding = Encoding.GetEncoding("shift_jis");
// Declare overline (U+203E).
char c = (char) 0x203E;
// Get bytes when encoded as Shift-JIS.
var bytes = encoding.GetBytes(c.ToString());
// Expected 0x7E, but the value returned is 0x3F.
}
Is this behavior correct? I suppose I could subclass EncoderFallback, but this seems like far more work for something that I would have expected to work from the start.
Upvotes: 0
Views: 1025
Reputation: 1284
Upon further investigation, I must conclude that Shift JIS is a misnomer. Rather, this is codepage 932. Unicode and Microsoft provide a mapping table between this and Unicode. This is apparently what is being used to map the characters. Notice that it does not contain a mapping between (0x5C, U+00A5) and (0x7E, U+203E).
Note though that I wrote in the original question that "I have verified that while the yen sign (U+00A5) properly becomes 0x5C". Apparently, the Encoding.GetEncoding(String) method returns an encoding which has a DecoderFallback defined as System.Text.InternalDecoderBestFitFallback, which I assume is providing additional mappings for some characters which would normally fail. It must contain an additional mapping for yen (U+00A5), but unfortunately nothing for overline (U+203E). When I replace this with EncoderExceptionFallback if fails for bother characters.
Hence, I conclude that for Shift JIS, this is an error. But for codepage 932, it is the expected result.
Upvotes: 1