When querying with LINQ-to-XML, is it better/more efficient to leave element values as strings or convert them to the correct type?

Question

I'm constantly running up against this when writing queries with LINQ-to-XML: the Value property of an XElement is a string, but the data may actually be an integer, boolean, etc.

Let's say I have a "where" clause in my query that checks if an ID stored in an XElement matches a local (integer) variable called "id". There are two ways I could do this.

1. Convert "id" to string

string idString = id.ToString();
IEnumerable elements =
    from
        b in TableDictionary["bicycles"].Elements()
    where
        b.Element(_ns + "id").Value == idString
    select
        b;

2. Convert element value to int

IEnumerable elements =
    from
        b in TableDictionary["bicycles"].Elements()
    where
        int.Parse(b.Element(_ns + "id").Value) == id
    select
        b;

I like option 2 because it does the comparison on the correct type. Technically, I could see a scenario where converting a decimal or double to a string would cause me to compare "1.0" to "1" (which would be unequal) versus Decimal(1.0) to Decimal(1) (which would be equal). Although a where clause involving decimals is probably pretty rare, I could see an OrderBy on a decimal column--in that case, this would be a very real issue.

A potential downside of this strategy, however, is that parsing tons of strings in a query could result in a performance hit (although I have no idea if it would be significant for a typical query). It might be more efficient to only parse element values when there is a risk that a string comparison would result in a different result than a comparison of the correct value type.

So, do you parse your element values religiously or only when necessary? Why?

Thanks!

EDIT:

I discovered a much less cumbersome syntax for doing the conversion.

3. Cast element to int

IEnumerable elements =
    from
        b in TableDictionary["bicycles"].Elements()
    where
        (int)b.Element(_ns + "id") == id
    select
        b;

I think this will be my preferred method from now on...unless someone talks me out of it :)

EDIT II:

It occurred to me since posting my question that: THIS IS XML. If I really had enough data for performance to be an issue, I would probably be using a real database. So, yet another reason to go with casting.

Foredecker · Accepted Answer

Its difficult to assess the performance issues here without measuring. But I think you have two scenarios.

If you need to use most (or all) of the values in an expression sooner or later, then it is probably best to pay the CPU costs of converting to native types up front - discarding the XML string data early.
If you are only going to touch (evaluate or use) a few of the values, then it will most likely be cheaper in terms of CPU time to convert string data to native types lazily - at the time of (or close to it temporally) consumption.

Now, this is just the CPU time considerations. I suggest that it is likely that the data itself will take up considerably less memory once converted to native value types. This lets you discard the string (XML) data early.

In short, it is rare for questions like this to have black or white answers: it will depend on your scenario, the complexity of the data, how much data there is, and when it will be used (touched or evaluated).

Update

In Dan's comment to my original answer, he ask for a general rule of thumb in cases where there is not time, or reason to do detailed measurements.

My suggestion is to prefer conversion to native types at XML parsing time, not keep the string data around and parse lazily. Here is my reasoning

The code will already be burning some CPU, I/O, and memory resources at parasing time.
The code is like to be simpler doing the conversions at load time (rather than at another time) as this can all be coded in a simple procedural way.
This is likely to be more memory efficient as well.
When the data needs to be used, it is already in a native format - this will be much better performing than dealing with string data at consumption time: comparisons and computation with native types will usually be much more efficient than dealing with data in string format. This is likely to keep the consuming code simpler as well.

Again, I'm suggesting this as a rule of thumb :) There will be scenarios where another approach is more optimal from a performance standpoint, or will make the code 'better' in some way (more cohesive, modular, easier to maintain, etc).

This is one of those cases where you will most likely need to measure the results to be sure you are doing the right thing.

When querying with LINQ-to-XML, is it better/more efficient to leave element values as strings or convert them to the correct type?

Answers (2)

Update

Related Questions