John Smith
John Smith

Reputation:

The cost of casting?

I have a key/value pairs mapped by hash (ID) in Dictionary<string, Dictionary<string, string>> called id2key_value. You can think of this as a way to represent database-like table with rows.

I added some helper functions to ease the use of some basic datatypes by doing a cast, like

public int GetInt(string id, string key)
{
    int value = 0;
    bool success = int.TryParse(map[id][key], out value);

    if (success)
        return value;
    else
        throw new InvalidOperationException(
            "Trying to obtain a non-integer value with GetInt().");
}

Well, I thought I was being clever when I came up with an idea of a "cast-cache", which basically holds already parsed objects, so I could skip the parsing of the string for int, bool, DateTime, etc., and simply cast them to appropriate datatype from the cache. Like,

public int GetInt(string id, string key)
{
    if (cast_cache.ContainsKey(id) && cast_cache[id].ContainsKey(key))
        return (int) cast_cache[id][key];

    int value = 0;
    bool success = int.TryParse(map[id][key], out value);

    if (success)
    {
        this.AddToCache(id, key, value);

        return value;
    }
    else
        throw new InvalidOperationException(
            "Trying to obtain a non-integer value with GetInt().");
}

The "cast-cache" is simply Dictionary<string, Dictionary<string, object>>.

So, I did some performance testing with 10000 ints added to the map. Then I did one million random retrieves, with and without "cast-caching".

It took 495(ms) without caching and 490(ms) with caching. I also ran a test with DateTime, there the difference was more significant, but less than I expect (~750(ms) non-cached vs ~500(ms) cached).

Not (obviously) understanding the principle of a cast, how costly this operation is and why the performance is so close to the one that "deserializes" from string?

Upvotes: 1

Views: 4375

Answers (5)

Hounshell
Hounshell

Reputation: 5459

There are a couple things to consider in there. First, just so you know the right terminology, this is actually unboxing (you've taken a value-type and stored it as a reference-type, or boxed it. Reverting to the value-type is unboxing).

Second, I'd wager that the majority of your code isn't in the unboxing, instead it's in the multiple calls into the cache dictionary:

if (cast_cache.ContainsKey(id) && cast_cache[id].ContainsKey(key))            
    return (int)cast_cache[id][key]

I count 5 dictionary traverses in there: cast_cache.ContainsKey(id), cast_cache[id], .ContainstKey(key), cast_cache[id], and [key].

That's pretty harsh. You could cut down on a lot of these by using an aggregated key. Instead of looking for [id][key], combine those into a single object. That would cut your number of dictionaries down exponentially and cut those lookups down to 2, 1 if you skip the ContainsKey() with a try/catch (check the speed on that).

Here's a class that would allow you to combine those:

public class Vector
{
    private object[] _Data;

    public object this[int index]
    {
        get
        {
            return _Data[index];
        }
    }

    public Vector(params object[] data)
    {
        _Data = (object[])data.Clone();
    }

    public override bool Equals(object obj)
    {
        Vector OtherVector = obj as Vector;

        if (OtherVector == null)
            return false;

        if (OtherVector._Data.Length != _Data.Length)
            return false;

        for (int I = 0; I < _Data.Length; I++)
            if (!_Data[I].Equals(OtherVector._Data[I]))
                return false;

        return true;
    }

    public override int GetHashCode()
    {
        int Result = 0;
        for (int I = 0; I < _Data.Length; I++)
            Result = Result ^ (_Data[I].GetHashCode() * I);

        return Result;
    }
}

Try that out and see how your speed goes

Upvotes: 0

adrianbanks
adrianbanks

Reputation: 82994

Your code may perform quicker with the cache if you reduced the number of dictionary lookups. Additionally, making the cache Dictionary<string, int> instead of Dictionary<string, object> will avoid boxing and unboxing which is also costly.

public int GetInt(string id, string key)
{
    int value;
    Dictionary<string, int> cache;

    if (cast_cache.TryGetValue(id, out cache) 
          && cache.TryGetValue(key, out value))
    {
        return value;
    }

    if (int.TryParse(map[id][key], out value))
    {
        this.AddToCache(id, key, value);
        return value;
    }

    throw new InvalidOperationException("Trying to obtain a non-integer value with GetInt().");
}

Upvotes: 0

Daniel Pryden
Daniel Pryden

Reputation: 60987

Hang on, what you're calling a "cast" in your code is not a cast.

If you're doing this:

bool success = int.TryParse(map[id][key], out value);

That's a conversion, not a cast. A cast would be:

value = (int) map[id][key];

or

value = map[id][key] as int;

Which would fail in this case because the object really is a string, not an int.

If you have a Dictionary<string, string> but you are really interested in storing arbitrary objects in it, I would make it a Dictionary<string, object>. As a result you'll be able to use casts instead of conversions, which as Andrew Hare points out is going to be a lot faster.

Upvotes: 2

Andrew Hare
Andrew Hare

Reputation: 351566

Casting is much faster that most people seem to think since you are not touching the object itself (you are simply changing the reference that points to that object).

One of the main reasons that you ought to avoid casting is that when you cast you are eschewing type-safety and introducing potential execution-time errors into your application. I would rarely consider casting to be a performance concern.

As a side note I would make sure that you test both reference types and value types in your cache to make sure that you are not incurring a performance penalty due to the boxing and unboxing of any value types.

The performance penalty of boxing comes from the fact that casting a value type to an object does involve more than changing a reference as the value type must be copied to the heap. Also the use of a boxed value type will unbox the reference type and subsequently copy those values from the heap to the stack again.

Upvotes: 12

mcandre
mcandre

Reputation: 24642

Your example works faster because of the relationship between time and space. How much memory does your program need if you continue caching every hash type?

Upvotes: -2

Related Questions