Reputation: 775
I have a list of objects which contain a date and value. There is one object per date, and an object for every date for the past several months. I am looking for the date where the value was changed to the most recent value.
Here is an example of what I mean:
<datevalue>
<date>8-9</date>
<value>5</value>
</datevalue>
<datevalue>
<date>8-10</date>
<value>6</value>
</datevalue>
<datevalue>
<date>8-11</date>
<value>5</value>
</datevalue>
<datevalue>
<date>8-12</date>
<value>5</value>
</datevalue>
<datevalue>
<date>8-13</date>
<value>5</value>
</datevalue>
In the above example, the current value is 5 because it is the value on 8-13, the most recent date. I want to return the 8-11 datevalue object because it is the day on which the value was changed to the most recent value. I don't want the 8-9 value because even though its the earliest day with the current value, the value was changed after that date.
Here was my first attempt to solve this:
DateValue FindMostRecentValueChange(List<DateValue> dateValues)
{
var currentValue = dateValues
.OrderByDesc(d => d.date)
.Select(d => d.value)
.First();
var mostRecentChange = dateValues
.OrderByDesc(d => d.date)
.TakeWhile(d => d.value = currentValue)
.Last();
return mostRecentChange;
}
This works. However, it was pointed out to me that I was repeating the OrderByDesc for both operations. Considering that OrderByDesc could be an expensive operation, I wanted to not have to do it twice. Therefore I made a change:
DateValue FindMostRecentValueChange(List<DateValue> dateValues)
{
var orderedDateValues = dateValues.OrderByDesc(d => d.date);
var currentValue = orderedDateValues;
.Select(d => d.value)
.First();
var mostRecentChange = orderedDateValues
.TakeWhile(d => d.value = currentValue)
.Last();
return mostRecentChange;
}
Now I only call OrderByDesc once. It's an improvement, right? Well, maybe not. OrderByDesc is a delayed execution.
From what I understand, that means that the actual ordering isn't done until you ask for a value from it. So when you call First() while looking for the currentValue you perform the OrderByDesc, and then it's executed again when you call Last() while looking for the mostRecentChange. So does that mean that I'm still executing OrderByDesc twice?
Am I correctly interpreting how delayed execution operates? I would hope that the compiler would recognize this scenario and optimize it behind the scenes so that the execution is only called once, but I cannot find any information to support this theory. Can you help me wrap my head around the best way to optimize this solution?
Upvotes: 2
Views: 307
Reputation: 203817
So does that mean that I'm still executing OrderByDesc twice?
Yes, that is correct.
I would hope that the compiler would recognize this scenario and optimize it behind the scenes so that the execution is only called once, but I cannot find any information to support this theory.
It cannot, because that would change the intended functionality in several key ways.
If the underlying data changes, those changes should be reflected when iterating the sequence again. If you added a new item to dateValues
between the first query and the second, it should be there in the second query. If you removed an item, it shouldn't be there, etc.
To get what you're asking for it would need to store all of the items in some sort of collection, even after the first consumer was "done" with them. That is undesirable. The idea here is that you can stream the data, and that once you've finished processing an item you're "done" with it, and don't need to hold it in memory. What if you don't have enough memory to hold onto all of the items in the query for subsequent runs?
Can you help me wrap my head around the best way to optimize this solution?
It's quite trivial. Just populate a data structure with the results of the query. The easiest way to do this, just put them all in a list. Add a ToList
call to the end of the query and it will evaluate it once, and then the resulting list can be iterated many times without negative consequences. Since this solution, when such semantics are desired, is so easy to get, whereas the semantics of deferred execution are much harder to get, despite being more powerful, they chose not to base LINQ on materialized collections.
Upvotes: 3
Reputation: 63317
No, your queries will be executed right if you use First()
or Last()
and some others. That means you call the OrderBy
twice (including OrderByDescending
).
You can try this:
var mostRecentChange = dateValues.OrderBy(d=>d.Date)
.SkipWhile((x,i)=>i==dateValues.Count-1||x.Value == dateValues[i+1].Value)
.Take(1);
Upvotes: 0