Ultra-efficient UTF8 byte encoding in C#

Question

I'm trying to achieve maximum performance in a library that needs to convert C# strings to byte[] before sending them off somewhere.

Since a char in UTF8 takes maximum 4 bytes, my current approach is to preallocate a large byte[]. When a string arrives for encoding, I can use System.Text.Encoder to populate the byte array with it. If the string's char length * 4 is bigger than my buffer, I allocate a new one (optimizing via a buffer pool), but this is supposed to become relatively rare quite quickly.

My only issue with this solution, is that System.Text.Encoder doesn't appear to accept a string - only a char[]. Retrieving a char[] from the string involves another seemingly needless copy. There's an unsafe version with char pointers, but I'm prohibited from using that in my library at the moment.

As a side note, StringWriter, which also does UTF8 conversion, maintains an internal buffer. This is again unsuitable - I need my own buffer since I encode other data types as well (e.g. ints).

So does anyone have any idea where Encoder doesn't provide a method that works directly on String?

sisve · Accepted Answer

Look at the Encoding classes, they wrap the Encoder classes.

It sounds like you should stick with the builtin text encodings until they have been proven ineffective. There's a UTF8Encoding.GetBytes(String, Int32, Int32, Byte[], Int32) which will take your string and write directly to your pre-allocated byte array.

There's also a UTF8Encoding.GetByteCount(String) that can calculate the size of the byte array before you allocate memory for it.

Ultra-efficient UTF8 byte encoding in C#

Answers (1)

Related Questions