Anonym
Anonym

Reputation: 7725

LINQ (to objects) , running several queries over the same IEnumerable?

Is it somehow possible to chain together several LINQ queries on the same IEnumerable ?

Some background,

I've some files, 20-50Gb in size, they will not fit in memory. Some code parses messages from such a file, and basically does :

 public IEnumerable<Record> ReadRecordsFromStream(Stream inStream) {
            Record msg;
            while ((msg = ReadRecord(inStream)) != null) {
                yield return msg;
            }
        }

This allow me to perform interesting queries on the records. e.g. find the average duration of a Record

 var records = ReadRecordsFromStream(stream);
 var avg = records.Average(x => x.Duration);

Or perhaps the number of records per hour/minute

var x = from t in records 
    group t by t.Time.Hour + ":" + t.Time.Minute into g
    select new { Period = g.Key, Frequency = g.Count() };

And there's a a dozen or so more queries I'd like to run to pull relevant info out of these records. Some of the simple queries can certainly be combined in a single query, but this seem to get unmanegable quite fast.

Now, each time I run these queries, I have to read the file from the beginning again, all records reparsed - parsing a 20Gb file 20 times takes time, and is a waste.

What can I do to be able to do just one pass over the file, but run several linq queries against it ?

Upvotes: 2

Views: 277

Answers (3)

eSPiYa
eSPiYa

Reputation: 950

I have done this before for logs with 3-10MB/file. Haven't reached that file size but I tried to execute this in a 1GB+ total log files without consuming that much of RAM. You may try what I did.

Upvotes: 0

Jon Skeet
Jon Skeet

Reputation: 1500185

You might want to consider using Reactive Extensions for this. It's been a while since I've used it, but you'd probably create a Subject<Record>, attach all your queries to it (as appropriate IObservable<T> variables) and then hook up the data source. That will push all the data through the various aggregations for you, only reading from disk once.

While the exact details elude me without downloading the latest build myself, I blogged on this a couple of times: part 1; part 2. (Various features that I complained about being missing in part 1 were added :)

Upvotes: 5

lll
lll

Reputation: 317

There's a technology that allows you to do this kind of thing. It's called a database :)

Upvotes: -1

Related Questions