Reputation: 77
I'm working on a personal project where I'm in need of some help with a performant Linq query to a database. The DB in question could have millions of log entries, and through an API (Asp) I want to have an option to only return a representative subset of those logs to a graphical interface.
Here's the method in question:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd)
{
return _context.Logs.Where
(a => a.ParameterID == ParameterID &&
(DateTime.Compare(a.LogDate,timeStart) > 0 && DateTime.Compare(a.LogDate,timeEnd) < 0)).ToList();
}
Note that the method takes in two DateTimes as parameters, which yield a range of time where the logs should be queried.
I would want to augment this method like so:
public IEnumerable<Log> GetByParameter(int ParameterID,DateTime timeStart, DateTime timeEnd, int limit)
For example, the DB might contain 2 million entries given the parameters passed, and the "limit" of the consumer of the API might be 40 000 entries. Thus:
numberOfEntries/limit = n
2*106 / 4*104 = 50
In this example I would want to return every 50th element to the consumer of the API, with an evenly spaced time interval between the elements.
An easy way would just be to query the entire table given the parameters and then filter out afterwards, but that seems messy and a bit antithetical to this approach, possibly very ineffective as well.
So here is my question: Is there any way to write a query such that it only queries the DB for every Nth row?
Thanks in advance!
Upvotes: 1
Views: 73
Reputation: 5021
You can implement it using SQL Server window functions like row_number:
WITH x AS
(
SELECT ROW_NUMBER() over (order by LogDate) as rn, *
FROM MyTable
WHERE
ParameterID = @ParameterID AND
LogDate > @StartDate AND
LogDate < @EndDate
)
SELECT * from X WHERE rn % 50 = 0
In LINQ you can try to use the following clause:
var data = _context.Logs
.Select((x, i) => new { Data = x, Number = i })
.Where(x => x.Number % 50 == 0)
.Select(x => x.Data);
But it's necessary to check actual execution plan, I guess that it will not be optimal.
Don't forget to create an index on LogDate.
Honestly I'm not sure that SQL Server is a good choice to store logs, I would like to use something like Elastic.
Upvotes: 1
Reputation: 9002
An approach you could take is using modulus on some kind of index. If you already have an auto generated Id
, that could be used - but it's not ideal as you can't rely on it being continuous.
You could use RANK()
to create an index column within a view, but unfortunately you can't use RANK()
directly from EF code.
Something like the following:
var interval = 5;
return _context.Logs
.Where(a =>
a.ParameterID == ParameterID &&
(
DateTime.Compare(a.LogDate,timeStart) > 0 &&
DateTime.Compare(a.LogDate,timeEnd) < 0) &&
a.Id % interval == 0).ToList(); //Filter on modulus of an index
In this instance however I personally would write the query in SQL.
Upvotes: 1