Reputation: 827
I have been looking for any examples of custom encoding in .net. Let's say if I want to implement roman8
encoding/decoding in .net, how do I do that? In a nut shell, I know that we need to inherit from Encoding system class and implement our own encoder/decoder methods, but without examples it looks complicated. There is one example I can see from JonSkeet, but that's too old to follow in my opinion.
Any help would be appreciated. Thanks!
Upvotes: 1
Views: 2377
Reputation: 485
Now that .Net is open source, you can view the source code of the encodings included in the framework.
It looks like the Unicode implementations use interop to call some native code to do the actual work, but there are a few which are fully implemented in C#, such as ISCIIEnocding
Here is the source: https://referencesource.microsoft.com/#mscorlib/system/text/isciiencoding.cs
To create an implementation for a new encoding, you need to subclass System.Text.Encoding
and implement the following methods. I'm assuming you're using a simple 1:1 encoding like roman8, if not things will be a bit more complicated!
GetByteCount()
and GetCharCount()
both return the number of bytes/chars the input will produce. In this case we can just return length of the input array.
GetMaxByteCount()
and GetMaxCharCount
are similar, but return the theoretical maximum number of items which could be returned for the given input. Once again, we can just return the same length.
To do the actual conversion, these methods will be called. The base Encoding class will take care of creating the arrays for you, you just need to fill in the output with the correct values.
public override int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex)
{
for (var i = 0; i < charCount; i++)
{
bytes[byteIndex + i] = GetByte(chars[charIndex + i]);
}
return charCount;
}
Where GetByte()
is a simple method to look up the index of the char in your array.
public override int GetChars(byte[] bytes, int byteIndex, int byteCount, char[] chars, int charIndex)
{
for (var i = 0; i < byteCount; i++)
{
chars[charIndex + i] = conversionArray[bytes[byteIndex + i]];
}
return byteCount;
}
Populate conversionArray
with your characters at the correct index for the encoding.
See https://dotnetfiddle.net/eBvgc6 for a working example.
Upvotes: 2