Reputation: 249
I have an application that uses a large amount of strings. So I have some problem of memory usage. I know that one of the best solution in this case is to use a DB, but I cannot use this for the moment, so I am looking for others solutions.
In C# string are store in Utf16, that means I lost half of the memory usage compare to Utf8 (for the major part of my strings). So I decided to use byte array of utf8 string. But to my surprise this solution took twice more memory space than simple strings in my application.
So I have done some simple test, but I want to know the opinion of experts to be sure.
Test 1 : Fixed length strings allocation
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var stringGen = new Random(561651);
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < 10000; j++) {
Sb.Append((stringGen.Next(90)+32).ToString());
}
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
00007ffac1fd02b8 56 152400 System.Object[]
000000bf7655fcf0 303 3933750 Free
00007ffac1fd5738 10004 224695091 System.Byte[]
00007ffac1fcfc40 10476 449178396 System.String
As we can see, bytes arrays take twice less memory space, no real surprise here.
Test 2 : Random size string allocation (with a realistic length)
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var lengthGen = new Random(2138784);
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < lengthGen.Next(100); j++) {
Sb.Append(i.ToString());
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
}
Sb.Clear();
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
000000be2aa8fd40 12 82784 Free
00007ffac1fd02b8 56 152400 System.Object[]
00007ffac1fd5738 9896 682260 System.Byte[]
00007ffac1fcfc40 10368 1155110 System.String
String takes a little less space than twice time the memory space of byte array. With shorter string I was expecting a greater overhead for strings. But it seems that the opposite is, why?
Test 3 : String model corresponding to my application
var stringArray = new string[10000];
var byteArray = new byte[10000][];
var Sb = new StringBuilder();
var utf8 = Encoding.UTF8;
var lengthGen = new Random();
for (int i=0; i < 10000; i++) {
if (i%2 == 0) {
for (int j = 0; j < lengthGen.Next(100000); j++) {
Sb.Append(i.ToString());
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
} else {
stringArray[i] = Sb.ToString();
byteArray[i] = utf8.GetBytes(Sb.ToString());
Sb.Clear();
}
}
GC.Collect();
GC.WaitForFullGCComplete(5000);
Memory Usage
00007ffac200a510 1 80032 System.Byte[][]
00007ffac1fd02b8 56 152400 System.Object[]
00007ffac1fcfc40 5476 198364 System.String
00007ffac1fd5738 10004 270075 System.Byte[]
Here strings take much less memory space than byte. This can be surprising, but I supposed that empty string are referenced only once. Is it? But I don't know if this can explain all that huge difference. Is it any other reason? What is the best solution?
Upvotes: 12
Views: 3668
Reputation: 726519
This can be surprising, but I supposed that empty string are referenced only once.
Yes, an empty StringBuilder
returns string.Empty
as its result. The code snippet below prints True
:
var sb = new StringBuilder();
Console.WriteLine(object.ReferenceEquals(sb.ToString(), string.Empty));
But I don't know if this can explain all that huge difference.
Yes, this perfectly explains it. You are saving on 5,000 string
objects. The difference in bytes is roughly 270,000-(198,000/2), so about 170 kBytes. Dividing by 5 you get 34 bytes per object, which is roughly the size of a pointer on a 32-bit system.
What is the best solution?
Do the same thing: make yourself a private static readonly
empty array, and use it each time that you get string.Empty
from sb.ToString()
:
private static readonly EmptyBytes = new byte[0];
...
else
{
stringArray[i] = Sb.ToString();
if (stringArray[i] == string.Empty) {
byteArray[i] = EmptyBytes;
} else {
byteArray[i] = utf8.GetBytes(Sb.ToString());
}
Sb.Clear();
}
Upvotes: 5