Reputation: 125
I can't come up with a solution to filter a huge (30,000+ items) collection in a very performant way...
Here is the case I have: We have a scene with scene objects. I serialize these scene objects to a collection. Every scene object is represented as an instance of SceneObject class which includes data about object's poisition in scene, its rotation etc. All of these objects can have positions in range from (-1000, -1000) to (1000, 1000) coordinates. We should be able to get a list of objects that have position in a range of coordinates. For example we need to get a list of objects that have positions in range from (150, -800) to (250, -600).
The problem is that the operation of getting results should be very performant and shouldn't allocate any garbage. I could think of indexing this list once but I can't undetstand how to make it correctly.
So what is the best way to do it?
Here is how I do it now. I stored all objects in SQLite database table and query needed objects with SELECT query, but the performance of such approach is slow and it allocated garbage:
List<SceneObjectData> objectsToStream;
void LoadSceneObjectsAroundPlayer()
{
objectsToStream = dbManager.Query<SceneObjectData>(BuildSqlQuery());
foreach (SceneObjectData sceneObjectData in objectsToStream)
{
LoadObject(sceneObjectData);
}
}
string BuildSqlQuery()
{
query = "SELECT * FROM SceneObjectData" +
" WHERE " + "((objectsSizeType = " + (int)SceneObjectSize.huge +
" AND " + "xPosition > " + (tempPosition.x - hugeObjectsRadius).ToString() +
" AND " + "xPosition < " + (tempPosition.x + hugeObjectsRadius).ToString() +
" AND " + "zPosition > " + (tempPosition.z - hugeObjectsRadius).ToString() +
" AND " + "zPosition < " + (tempPosition.z + hugeObjectsRadius).ToString() + ")" +
" OR " + "(objectsSizeType = " + (int)SceneObjectSize.big +
" AND " + "xPosition > " + (tempPosition.x - bigObjectsRadius).ToString() +
" AND " + "xPosition < " + (tempPosition.x + bigObjectsRadius).ToString() +
" AND " + "zPosition > " + (tempPosition.z - bigObjectsRadius).ToString() +
" AND " + "zPosition < " + (tempPosition.z + bigObjectsRadius).ToString() + ")" +
" OR " + "(objectsSizeType = " + (int)SceneObjectSize.medium +
" AND " + "xPosition > " + (tempPosition.x - mediumObjectsRadius).ToString() +
" AND " + "xPosition < " + (tempPosition.x + mediumObjectsRadius).ToString() +
" AND " + "zPosition > " + (tempPosition.z - mediumObjectsRadius).ToString() +
" AND " + "zPosition < " + (tempPosition.z + mediumObjectsRadius).ToString() + ")" +
" OR " + "(objectsSizeType = " + (int)SceneObjectSize.small +
" AND " + "xPosition > " + (tempPosition.x - smallObjectsRadius).ToString() +
" AND " + "xPosition < " + (tempPosition.x + smallObjectsRadius).ToString() +
" AND " + "zPosition > " + (tempPosition.z - smallObjectsRadius).ToString() +
" AND " + "zPosition < " + (tempPosition.z + smallObjectsRadius).ToString() + "))" +
" AND isLoaded = 0";
query = query.Replace(",", ".");
return query;
}
I would very appreciate any suggestion or advice!
Upvotes: 0
Views: 409
Reputation: 1063198
It is surprisingly expensive to construct a SQL query (even with static text), add parameters in ADO.NET, send it over the wire, collect the results, and materialize that into an object model. Fine for discreet operations, or line-of-business apps, but not good for constantly working with the same data. For that, it would be better to load the data once into an in-memory model, and work in local memory. By any objective measure, 30k objects is tiny and you shouldn't really need to optimize it, but if this number was order(s) of magnitude higher, you could get more advanced by breaking your coordinate system into smaller grids, keeping each chunk separately, so you can very quickly omit entire chunks (and everything they contain), only actually scanning the chunks that overlap your query range.
In a simple test here, it seems to take about .25ms locally to scan 30k items based on a double
X/Y range, the naïve way:
using System;
using System.Collections.Generic;
using System.Diagnostics;
class P
{
static void Main()
{
var list = new List<SceneItem>();
var rand = new Random();
for(int i = 0; i < 30_000; i++)
{
list.Add(new SceneItem {
X = (rand.NextDouble() * 2000) - 1000,
Y = (rand.NextDouble() * 2000) - 1000 }
);
}
Console.WriteLine(CountItems(list, 100, -800, 250, -600));
var watch = Stopwatch.StartNew();
for (int i = 0; i < 10000; i++)
{
CountItems(list, 100, -800, 250, -600);
}
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds); // note: benchmark.net would be better here
static int CountItems(List<SceneItem> items, double fromX, double fromY, double toX, double toY)
{
int count = 0;
foreach (var item in items)
{
if (item.X >= fromX && item.Y >= fromY && item.X < toX && item.Y < toY)
count++;
}
return count;
}
}
}
public class SceneItem
{
public double X { get; set; }
public double Y { get; set; }
}
Note also: avoid switching to LINQ here; anything that involves a capture context (so: lambdas, query expressions, etc) and delegate/expression trees, is going to create lots of allocations. It is also very deliberate that I'm using a simple container type; List<T>
and arrays like T[]
can be iterated without allocations; abstractions like IList<T>
, IEnumerable<T>
etc cannot.
Note that if you need to update an in-memory model from a database, you can use tricks like CHECKSUM_AGG
to check whether there are likely changes, without having to fetch everything each time.
Upvotes: 4