Reputation: 1145
I have asp.net Core 3.1 API with ~500 requests/second. And I'm doing pagination in a pure memory-based collection. the collection is a large ~700K to 1 million items. I'm only using LINQ "Where", "OrderBy", "Take" and "Skip" which is the simplest pagination approach. let say that my class looks like this
public class Item
{
public int Id { get; set; }
public string Name { get; set; }
public DateTime AgeUtc { get; set; }
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters uiParameters)
{
IEnumerable<ItemDto> items;
if (watermark.HasValue)
{
items = MemoryCacheService.Instance.ReadOnlyItems
.Where(x => x.Id > watermark)
.Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
return items;
}
items = MemoryCacheService.Instance.ReadOnlyItems
.Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
return items;
}
My Problem is
watermark
value which causes high randomization in URLs.I'm sure there is a better way to make it more efficient. Any help :)
Edit 1 ——————
We did kind of partitioning (in memory) by splitting our collection into smaller lists, and the results are better. But I still believe we are on the wrong path, we tried to use Redis before using the in-memory collection and things become complicated when users grow from 1 to 1.6 million and a lot network exception occurs and things related to Redis stack exchange MultiPlexer. Now, with In memory solution (at least), we are not getting exceptions but it’s slow! SQL Azure is really expensive! And when we did a load test for our API the SQL went down after 30 or 40 concurrent connections!
Edit 2 ——————
We modified the implementation above to be like this, the performance becomes much better. but I still believe we make things complicated and this is not best way to do it.
public class MemoryCacheService {
...
public ImmutableSortedDictionary<int, List<ItemDto>> Items { get; set; }
public List<ItemDto> ReadOnlyItems { get; }
...
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters resourceParameters)
{
IEnumerable<ItemDto> items;
if (watermark.HasValue)
{
// Validate if Watermark smaller than 1st watermark in the list
var firstKeyValye = MemoryCacheService.Instance.PartitionedReadOnlyItems.First().Key;
if (watermark < firstKeyValye)
{
watermark = firstKeyValye - 1;
}
var skipper = (resourceParameters.PageNumber - 1) * resourceParameters.PageSize;
var cursor = watermark.Value + skipper;
var startKey = cursor - resourceParameters.PageSize;
var endKey = cursor + resourceParameters.PageSize;
// this logic will get two pages and merge values them in a single List
var patientLocationsPages = MemoryCacheService.Instance.PartitionedReadOnlyItems
.Where(k => k.Key >= startKey && k.Key <= endKey)
.Select(s => s.Value)
.SelectMany(p => p.AsEnumerable());
items = patientLocationsPages
.Where(p => p.Id > cursor)
.Take(resourceParameters.PageSize);
return items;
}
items = MemoryCacheService.Instance.ReadOnlyItems
.Skip((resourceParameters.PageNumber - 1) * resourceParameters.PageSize)
.Take(resourceParameters.PageSize);
return items;
}
Upvotes: 1
Views: 139
Reputation: 26927
You don't explain how you are creating your ReadOnlyItems
, but if you can use a SortedList<TKey,TValue>
then you can greatly speed up your method.
Add your Item
s to the sorted list using Item.Id
as the key.
In my tests with a million random Item
s with a Id
spread of 4:1, I get about 0.02 - 0.04 ms with SortedList
versus 1.2 - 4 ms for first lookup (or about 100 times faster) to about 0.001 ms vs 3.5 ms for subsequent lookups (or about 400 times faster).
Then, you can do:
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters uiParameters) {
IEnumerable<ItemDto> items;
if (watermark.HasValue) {
var firstIndex = MemoryCacheService.Instance.ReadOnlyItems.IndexOfKey(watermark.Value) + 1;
items = MemoryCacheService.Instance.ReadOnlyItems.Values
.Skip(firstIndex + (uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
}
else
items = MemoryCacheService.Instance.ReadOnlyItems.Values
.Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
return items;
}
NOTE: SortedList<>.Values
implements IList
and Skip
and Take
have special indexed based optimizations for this.
You can use this extension method to covert to a SortedList
from an IEnumerable
:
public static SortedList<TKey, TValue> ToSortedList<T, TKey, TValue>(this IEnumerable<T> items, Func<T, TKey> keyFn, Func<T, TValue> valueFn) {
var ans = new SortedList<TKey, TValue>();
foreach (var item in items)
ans.Add(keyFn(item), valueFn(item));
return ans;
}
Upvotes: 0
Reputation: 67417
This is not a Sql query, it’s in memory collection.
Oh boy. This will never work, you're doing linear searches in a database with up to a million elements, 500 times a second?
You need to implement indexing of some sort on top of your collection, even something as simple as recording every 50 id's index in the collection, so you don't have to start from the beginning every time with .Where(x => x.Id > watermark)
. If it's sorted on Id
, even a binary search would be better than what you're doing right now.
Upvotes: 1