Reputation: 16722
I'm trying to achieve maximum performance in a library that needs to convert C# strings to byte[] before sending them off somewhere.
Since a char in UTF8 takes maximum 4 bytes, my current approach is to preallocate a large byte[]. When a string arrives for encoding, I can use System.Text.Encoder to populate the byte array with it. If the string's char length * 4 is bigger than my buffer, I allocate a new one (optimizing via a buffer pool), but this is supposed to become relatively rare quite quickly.
My only issue with this solution, is that System.Text.Encoder doesn't appear to accept a string - only a char[]. Retrieving a char[] from the string involves another seemingly needless copy. There's an unsafe version with char pointers, but I'm prohibited from using that in my library at the moment.
As a side note, StringWriter, which also does UTF8 conversion, maintains an internal buffer. This is again unsuitable - I need my own buffer since I encode other data types as well (e.g. ints).
So does anyone have any idea where Encoder doesn't provide a method that works directly on String?
Upvotes: 2
Views: 5170
Reputation: 19781
Look at the Encoding classes, they wrap the Encoder classes.
It sounds like you should stick with the builtin text encodings until they have been proven ineffective. There's a UTF8Encoding.GetBytes(String, Int32, Int32, Byte[], Int32) which will take your string and write directly to your pre-allocated byte array.
There's also a UTF8Encoding.GetByteCount(String) that can calculate the size of the byte array before you allocate memory for it.
Upvotes: 5