Miguel
Miguel

Reputation: 3496

IEnumerable<T> and "yield return" performance question

Good afternoon,

I am writing a simple lexer which is basically a modified version of this one. After getting each token I need to perform slight modifications and re-analyse it to re-check it's type. Also, of course, after the lexical analysis I need to re-use the whole token list to make a kind of "parsing" on it. My question is if using IEnumerable<Token> and yield return statements in the lexer can make the whole program's performance slower... Would it be preferable to use a List<Token>, to build the list iteratively and use a normal return statement? What about iterating throught the IEnumerable/List? Which one is faster?

Thank you very much.

Upvotes: 3

Views: 1890

Answers (4)

dark_perfect
dark_perfect

Reputation: 1478

By now, I'm sure you'll see that you're trying to optimize prematurely, which is, according to many, the root of all evil.

However, if you REALLY want to speed this up, regular expressions seem an expensive way to do it. Everytime you do a Regex.Match(), you're scanning the string again, which results in at least as many scans as you have tokens.

If you know the boundaries that define a token ('{' and '}', for example), you could scan the string once to build the enumerable of tokens (with yield, or list, I don't think that'll make much difference). The caller can then rebuild the string, looking up the values to replace the tokens with.

Of course, this would only work with simple "search and replace" type tokens. More complex ones would require something more sophisticated, such as a regex. Perhaps you could extend the TokenDefinition to specify whether the match is a simple one or a regex one. This would cut down the number of regular expressions performed, but still keep the flexibility required.

Upvotes: 0

Hans Passant
Hans Passant

Reputation: 941317

You are asking the wrong question, you should be worried far more about the cost of Regex. Enumerating the tokens will be a very small fraction of that, there's just no point in optimizing code that could be double as fast but only improves program perf by 1%.

Write the code, profile it, you'll know what to do for version 2. Given that these kind of tools run at 'human time' (no perceptible difference when the program takes twice as long when it needs 20 milliseconds), the most likely result is "nothing needs done".

Upvotes: 6

ichen
ichen

Reputation: 512

IEnumerable and yield return statements are converted into an GetEnumator() and the implementation of an enumerator in IL code.

Although yield return has its merits in terms of doing some additional work for each token returned during enumeration, I'd stick to List creation and returning the list as it incurs less method calls and therefore should be faster.

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500075

It's possible that it will have some performance impact - but it also allows the iterator to be built lazily.

Personally I'd write the code in the most readable way and measure its performance - then start worrying micro-optimizing this sort of thing. Test it one way, test it the other way, see how much readability you lose (if any) by using the most performant solution, and how much speed you actually gain.

Note that there's a very slight performance benefit to iterating over an expression which is known to be of type List<T> vs iterating over an IEnumerable<T> which happens to be implemented by List<T>, as List<T> implements the iterator itself using a mutable struct... basically you'll end up with a boxed value if you use the higher abstraction layer, but in that particular case I would almost certainly prefer using the right abstraction over the tiny performance improvement.

Upvotes: 3

Related Questions