Discarding Intermediate results in Linq

Question

I'm using an in-house file parsing library, which I'm using to parse a gnarly reporting file generated by a legacy system. The library iterates allows you to define Linq queries which are applied successively to return an enumerable set of structures in the file.

A typical example would be something like the below.

var OrderReportParser = 
    from blanks in Rep(BlankLine) // One or more blank lines
    from header1 in TextLine      // A header text line with no useful information
    from hyphenLine in WhiteSpace.And(Rep(Char('-'))).And(BlankLine)
                                  // A line containing only hyphens
    /* ... Snip ... lots of other from clauses */
    from orderId in WhiteSpace.And(AlphaNumeric) // The id of this record
    from price in WhiteSpace.And(Decimal)        // Order price
    from quantity in WhiteSpace.And(Integer)     // Order quantity
    select new OrderLine (orderId, price, quantity)

Because much of my file is simply text, many of the intermediate results generated by a statement such as the above are not required in the output (such as the variables blanks, header1, hyphenLine in the example above).

Is there any such mechanism in C# creating variables for the intermediate results, or do I always to create variable for each?

I am thinking of examples such as F#'s _ variable, which can be used in this fashion. See F#'s underscore: why not just create a variable name? for an example in the context of Tuples.

Magnus Grindal Bakken · Accepted Answer

If you're asking if it's possible to do something like this:

var OrderReportParser = 
    from Rep(BlankLine)
    from TextLine
    from WhiteSpace.And(Rep(Char('-'))).And(BlankLine)
    ...

...then the answer is no. The designers of Linq probably never imagined that people would want to select something and then immediately throw it away without looking at it, since with most other Linq providers this parser syntax would create a huge Cartesian product. (Or if they did think about it they didn't consider it to be a useful enough feature.)

Why do you want to get rid of the variable names anyway? Personally I think including the variable names is likely to make the intent of the code clearer. If the unused variables bother you so much I guess you can name them something like _1, _2 or dummy1, dummy2, etc. That should make it pretty clear that they aren't used for anything. But they have to be there.

Edit: I had an inkling that the "anonymous variable" _ from languages like F# might've been what you were driving at. The answer is still no, I'm afraid. You could name the first variable _, but you wouldn't be allowed to redefine it on the second line, so that would be it. Also, the _ variable wouldn't have the special semantics that it has in F#, so you would essentially be pretending that C# has a feature that it doesn't. Keep in mind that C# is fundamentally an imperative language. It has a lot of functional-style features, such as Linq, but it's still very much a language in the C/Java tradition, where these sorts of pattern matching features have not yet made much inroads. I like those features too, but you have to think a little differently when writing C#.

Discarding Intermediate results in Linq

Answers (1)

Related Questions