Reputation: 1580
I am trying to figure out how to write a LINQ / Entity Framework query to return the latest data available for each symbol in a table.
My database table looks like this:
ID symbol price_date price
------------------------------------
1 AAPL 2022-02-28 174.50
2 MSFT 2022-02-28 307.20
3 AAPL 2021-03-01 172.23
4 MSFT 2021-03-01 304.15
Not every symbol has a record for every day though. The ID key is sequential and is safe for use as the highest ID for a given symbol will contain the latest data.
If I was writing a SQL query, the following would return what I'm looking for:
select prices.*
from prices
where id in (select max(id) from prices group by symbol)
In Linq, I'm having trouble making this into a single query. What I have so far is dividing it into two queries:
var maxIds = from pp in ctx.Prices
group pp by pp.Symbol
into maxIdBySymbol
select maxIdBySymbol.Max(pp => pp.Id);
var latestPrices = ctx.Prices.Where(it => maxIds.Contains(it.Id)).ToList();
Is there a way to make this a single query in LINQ?
Thanks
Upvotes: 2
Views: 421
Reputation: 30512
Addition: the proposed solution works, but is less efficient
For more information see the addition at the end.
So you will first make groups of records, where every group contains only records for one specific symbol. So you will have one group that contains the records for symbol AAPL, one group that contains the records for symbol MSFT, etc.
I am trying ... query ... the latest data available for each symbol in a table.
So, once you've got the groups, you select one element in the group. According to your requirement you select the newest element, which is the element with the highest value for PriceDate
. As you said, you could also take the element with the highest value for property ID
. Personally I wouldn't do that, because if in a very far future your IDs are not in ascending date anymore, for instance because you add the feature to edit PriceDate after an input error.
For this, I would use the overload of Queryable.GroupBy that has a parameter resultSelector. Use the resultSelector to select the one element of each group that you want.
var newestRecordPerSymbol = dbContext.PriceRecords
// make groups of priceRecords with same value for property Symbol
.GroupBy( priceRecord => priceRecord.Symbol,
// parameter resultSelector: for every symbol and all priceRecords
// that have this symbol, take the newest one
// = order by descending PriceDate and take the first one
(symbol, priceRecordsWithThisSymbol) => priceRecordsWithThisSymbol
.OrderByDescending(priceRecord => priceRecord.PriceDate)
.FirstOrDefault();
In words: from the table of PriceRecords, make groups of PriceRecords that have the same value for property Symbol. From every combination of Symbol, and PriceRecords that have this symbol, order all PriceRecords by descending value for property PriceDate, and keep only the first one.
Every group has at least one element, so you could have used First
as well as FirstOrDefault
. Some versions of EntityFramework or DBMS have problems using First. If you encounter this problem, use FirstOrDefault.
If you still want to take the one with the highest ID:
.OrderByDescending(priceRecord => priceRecord.ID)
.FirstOrDefault(),
In the original solution, all records in a group are sorted, and only the first one is taken. It is a bit of a waste to sort the second, third, etc. element if you will only take the first one.
In the original SQL you'll see code like:
select maxIdBySymbol.Max(pp => pp.Id);
So, not all elements are sorted. The sequence is enumerated only once, and the largest one is returned. This is way more effcient then sorting elements that you won't use anyway.
To create code like this, we need to change parameter resultSelector of the GroupBy. Let is use a method like Max(propertySelector), or one of the overloads of Queryable.Aggregate. Something like this:
// parameter resultSelector: keep the record with the largest ID
(symbol, priceRecordsWithThisSymbol) => priceRecordsWithThisSymbol
.Max(record => record.Id);
Alas, although the guys from entity framework did a tremendous job, this overload of the Max method is not supported, so are none of the Aggregate methods. See List of Supported and Unsupported Linq methods
Upvotes: 1
Reputation: 77045
You can combine Where
with not Any
:
ctx.Prices.Where(prices1 => !ctx.Prices.Any(prices2 => (prices2.Id > prices1.Id) && (prices1.symbol.Equals(prices2.symbol))))
Upvotes: 1