Reputation: 1967
I need to limit the output byte[]
length encoded with UTF-8 encoding. Eg. byte[]
length must be less than or equals 1000
First I wrote the following code
int maxValue = 1000;
if (text.Length > maxValue)
text = text.Substring(0, maxValue);
var textInBytes = Encoding.UTF8.GetBytes(text);
works good if string is just using ASCII characters, because 1 byte per character. But if characters goes beyond that it could be 2 or 3 or even 6 bytes per character. That would be a problem with the above code. So to fix that problem I wrote this.
List<byte> textInBytesList = new List<byte>();
char[] textInChars = text.ToCharArray();
for (int a = 0; a < textInChars.Length; a++)
{
byte[] valueInBytes = Encoding.UTF8.GetBytes(textInChars, a, 1);
if ((textInBytesList.Count + valueInBytes.Length) > maxValue)
break;
textInBytesList.AddRange(valueInBytes);
}
I haven't tested code, but Im sure it will work as I want. However, I dont like the way it is done, is there any better way to do this ? Something I'm missing ? or not aware of ?
Thank you.
Upvotes: 0
Views: 2705
Reputation: 11
My first posting on Stack Overflow, so be gentle! This method should take care of things pretty quickly for you..
public static byte[] GetBytes(string text, int maxArraySize, Encoding encoding) {
if (string.IsNullOrEmpty(text)) return null;
int tail = Math.Min(text.Length, maxArraySize);
int size = encoding.GetByteCount(text.Substring(0, tail));
while (tail >= 0 && size > maxArraySize) {
size -= encoding.GetByteCount(text.Substring(tail - 1, 1));
--tail;
}
return encoding.GetBytes(text.Substring(0, tail));
}
It's similar to what you're doing, but without the added overhead of the List or having to count from the beginning of the string every time. I start from the other end of the string, and the assumption is, of course, that all characters must be at least one byte. So there's no sense in starting to iterate down through the string any farther in than maxArraySize (or the total length of the string).
Then you can call the method like so..
byte[] bytes = GetBytes(text, 1000, Encoding.UTF8);
Upvotes: 1