Homde
Homde

Reputation: 4286

Optimizing memory requirements for Dictionary keys

Save that I have a (large) collection of instances of Dictionary. The key value in that dictionary is always one of say 10 known strings.

If the collection contains 1000000 entries, will the that string key value occupy memory for each instance and key? Is there any good way to optimize a case like that, perhaps using string interning?

Another way would be to use say a short for the key instead and translate between the string and the short but the syntax get's a bit messy...

Upvotes: 3

Views: 1561

Answers (5)

Aliostad
Aliostad

Reputation: 81660

Usually not - they are stored as a single immutable variable. Strings can be interned which will help saving memory.

But this depends. If you construct the string every time (e.g. concatenation) they will not be interned. Defining them as constants ensures they will be interned.

You can check if two strings are the same in memory using object.ReferenceEquals().

Upvotes: 2

Jim Mischel
Jim Mischel

Reputation: 133950

As others have said, it depends on how you're getting the strings to put into your list. A couple of examples should help.

Imagine you have a text file that contains 1,000 lines, all the same. That is, a file has "hello" repeated 1,000 times:

hello
hello
hello
...

If you write a program to read that file into a List<string> the naive way, then there will be 1,000 different string instances. That is:

var myList = new List<string>();
var reader = new StreamReader("filename");
string s;
while ((s = reader.ReadLine()) != null)
{
    myList.Add(s);  // each string is a unique instance
}

If memory is a concern, then what you want to do is maintain a lookup table that has string keys and string values. It's a mapping of strings to single values. So when you use a duplicate string, you get a reference to the first instance.

var KeyLookup = new Dictionary<string, string>();
string AddString(string key)
{
    string value;
    if (!KeyLookup.TryGetValue(key, out value))
    {
        value = key;
        KeyLookup.Add(key, value);
    }
    return value;
}

And then when you read the file:

while ((s = reader.ReadLine()) != null)
{
    myList.Add(AddString(s));  // duplicate strings use the same instance
}

In this case, there will be only one instance of the string "hello" in the program.

You can do something similar with the keys in your lists. Create a lookup table for your keys and make sure that whenever you add a key to your list, you add the value from the lookup table rather than the key itself.

As others have pointed out, if your known keys are already constants and you always use the constant values when adding them to the lists, then the strings are already interned and the above isn't required.

Upvotes: 0

xanatos
xanatos

Reputation: 111810

Constant strings are interned (so string str = "hello"; is interned). Other strings normally aren't. You can force a string to be interned using the String.Intern static method, but be sure to read the side effects on http://msdn.microsoft.com/en-us/library/system.string.intern.aspx . Remember that if you have a const hello string and a dynamically built hello string, only the first will be interned. Sometimes you can gain a little memory by interning often-used strings. In your situation, if you are using only a little number of strings that are already memorized in another collection AND you copy these strings (var str2 = str1), then you aren't duplicating the string, only creating another reference. BUT if you obtain the new string manipulating the old string (var str2 = ("Z" + str1).Substring(1)) then you are really creating a new string instead of referencing the old one.

Upvotes: 1

Hans Passant
Hans Passant

Reputation: 941208

String is a reference type. The dictionary contains a reference to the actual string object, 4 bytes on a 32-bit operating system. Adding the same string to multiple dictionaries produces only one copy of the string.

You already got what you are looking for.

Upvotes: 1

Shekhar_Pro
Shekhar_Pro

Reputation: 18420

I think to save space or optimize it we can do one of these.

  • Create a 10 different List<T> s' of int for each Item name (string). And then do search in all 10 lists for item when retrieving.

  • Or create a Dictionary of lists like this Dictonary<List<int>,string> and store each key (in respective list) for each item name (string). Its almost same as above but allow you to add more items in future.

And i believe we will still get better performance

(However i would like other to comment on my assumption)

Also if you have got about 1,00,000 records you should better store it in a database and have two tables with One containing string and an ID for Item name and other containing key and Foreign key to Item ID.

Upvotes: 0

Related Questions