John von No Man
John von No Man

Reputation: 3030

Querying external data source with LINQ

I'm storing what basically amounts to log data stored in CSV files. It's of the format <datetime>,<val1>,<val2>, etc. However, the log files are stored by account ID and month, so if you query across months or account IDs you're going to retrieve multiple files.

I'd like to be able to query it with LINQ, so that if I could call logFiles.Where(o => o.Date > 1-1-17 && o.Date < 4-1-17). I suppose I'll need something to examine the date range in that query and notice that it spans 4 months, which then causes it to only examine files in that date range.

Is there any way to do this that does not involve getting my hands very dirty with a custom IQueryable LINQ provider? I can go down that rabbit hole if necessary, but I want to make sure it's the right rabbit hole first.

Upvotes: 0

Views: 287

Answers (1)

Heinzi
Heinzi

Reputation: 172380

If you want to filter both on the log file name and on the log file contents in the same Where expression, I don't see a solution without a custom IQueryable LINQ provider, because that's exactly the use case for them: To access data in a smart way based on the expressions used in the LINQ query.

That said, it might be worth to use a multi-step approach as a compromise:

  1. Use LINQ to restrict the log files to be searched,
  2. read the files and
  3. use LINQ for further searching.

Example:

IEnumerable<LogFile> files = LogFiles.Where(f => f.Date > new DateTime(17, 1, 1) && f.AccountID == 4711);
IEnumerable<LogData> data = ParseLogFiles(files);
IEnumerable<LogData> filteredData = data.Where(d => d.val1 == 42 && d.val2 > 17);
LogData firstMatch = filteredData.FirstOrDefault();

If you implement ParseLogFiles (a) with deferred execution and (b) as an extension method on IEnumerable<LogFile>, the resulting code will look-and-feel very similar to pure LINQ:

var filteredData = LogFiles.
    Where(f => f.Date > new DateTime(17, 1, 1) && f.AccountID = 4711).
    ParseLogFiles().
    Where(d => d.val == 42 && d.val2 > 17);

// If ParseLogFiles uses deferred execution, the following line won't read
// more log files than required to get the first matching row:
var firstMatch = filteredData.First();

It's a bit more work than having it all in one single LINQ query, but it saves you from having to implement your own LINQ provider.

Upvotes: 1

Related Questions