Reputation: 758
I'm creating a report generating tool that use custom data type of different sources from our system. The user can create a report schema and depending on what asked, the data get associated based different index keys, time, time ranges, etc. The project is NOT doing queries in a relational database, it's pure C# code in collections from RAM.
I'm having a huge performance issue and I'm looking at my code since a few days and struggle with trying to optimize it.
I stripped down the code to the minimum for a short example of what the profiler point as the problematic algorithm, but the real version is a bit more complex with more conditions and working with dates.
In short, this function return a subset of "values" satisfying the conditions depending on the keys of the values that were selected from the "index rows".
private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
var checkContainers = ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
values.Any(t => t.ContainerId.HasValue));
var checkEnterpriseId = ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
values.Any(t => t.EnterpriseId.HasValue));
var ret = new List<LoadedDataSource>();
foreach (var value in values)
{
var valid = true;
foreach (var index in indexRows)
{
// ContainerId
var indexConservedSource = index.AsEnumerable();
if (checkContainers && index.CheckContainer && value.ContainerId.HasValue)
{
indexConservedSource = indexConservedSource.Where(t => t.ContainerId.HasValue && t.ContainerId.Value == value.ContainerId.Value);
if (!indexConservedSource.Any())
{
valid = false;
break;
}
}
//EnterpriseId
if (checkEnterpriseId && index.CheckEnterpriseId && value.EnterpriseId.HasValue)
{
indexConservedSource = indexConservedSource.Where(t => t.EnterpriseId.HasValue && t.EnterpriseId.Value == value.EnterpriseId.Value);
if (!indexConservedSource.Any())
{
valid = false;
break;
}
}
}
if (valid)
ret.Add(value);
}
return ret;
}
This works for small samples, but as soon as I have thousands of values, and 2-3 index rows with a few dozens values too, it can take hours to generate.
As you can see, I try to break as soon as a index condition fail and pass to the next value.
I could probably do everything in a single "values.Where(####).ToList()", but that condition get complex fast.
I tried generating a IQueryable around indexConservedSource but it was even worse. I tried using a Parallel.ForEach with a ConcurrentBag for "ret", and it was also slower.
What else can be done?
Upvotes: 0
Views: 42
Reputation: 6103
What you are doing, in principle, is calculating intersection of two sequences. You use two nested loops and that is slow as the time is O(m*n). You have two other options:
The second approach seems better for this scenario. Just convert those index lists into HashSet and test values against it. I added some code for inspiration:
private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
var ret = values;
if ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
ret.Any(t => t.ContainerId.HasValue))
{
var indexes = indexRows
.Where(i => i.CheckContainer)
.Select(i => new HashSet<int>(i
.Where(h => h.ContainerId.HasValue)
.Select(h => h.ContainerId.Value)))
.ToList();
ret = ret.Where(v => v.ContainerId == null
|| indexes.All(i => i.Contains(v.ContainerId)))
.ToList();
}
if ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
ret.Any(t => t.EnterpriseId.HasValue))
{
var indexes = indexRows
.Where(i => i.CheckEnterpriseId)
.Select(i => new HashSet<int>(i
.Where(h => h.EnterpriseId.HasValue)
.Select(h => h.EnterpriseId.Value)))
.ToList();
ret = ret.Where(v => v.EnterpriseId == null
|| indexes.All(i => i.Contains(v.EnterpriseId)))
.ToList();
}
return ret;
}
Upvotes: 1