Dunge
Dunge

Reputation: 758

Collections manipulation, need help optimizing this code from a report generator

I'm creating a report generating tool that use custom data type of different sources from our system. The user can create a report schema and depending on what asked, the data get associated based different index keys, time, time ranges, etc. The project is NOT doing queries in a relational database, it's pure C# code in collections from RAM.

I'm having a huge performance issue and I'm looking at my code since a few days and struggle with trying to optimize it.

I stripped down the code to the minimum for a short example of what the profiler point as the problematic algorithm, but the real version is a bit more complex with more conditions and working with dates.

In short, this function return a subset of "values" satisfying the conditions depending on the keys of the values that were selected from the "index rows".

private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
    var checkContainers = ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
                           values.Any(t => t.ContainerId.HasValue));

    var checkEnterpriseId = ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
                             values.Any(t => t.EnterpriseId.HasValue));

    var ret = new List<LoadedDataSource>();
    foreach (var value in values)
    {
        var valid = true;

        foreach (var index in indexRows)
        {
            // ContainerId
            var indexConservedSource = index.AsEnumerable();
            if (checkContainers && index.CheckContainer && value.ContainerId.HasValue)
            {
                indexConservedSource = indexConservedSource.Where(t => t.ContainerId.HasValue && t.ContainerId.Value == value.ContainerId.Value);
                if (!indexConservedSource.Any())
                {
                    valid = false;
                    break;
                }
            }

            //EnterpriseId
            if (checkEnterpriseId && index.CheckEnterpriseId && value.EnterpriseId.HasValue)
            {
                indexConservedSource = indexConservedSource.Where(t => t.EnterpriseId.HasValue && t.EnterpriseId.Value == value.EnterpriseId.Value);
                if (!indexConservedSource.Any())
                {
                    valid = false;
                    break;
                }
            }
        }

        if (valid)
            ret.Add(value);
    }

    return ret;
}

This works for small samples, but as soon as I have thousands of values, and 2-3 index rows with a few dozens values too, it can take hours to generate.

As you can see, I try to break as soon as a index condition fail and pass to the next value.

I could probably do everything in a single "values.Where(####).ToList()", but that condition get complex fast.

I tried generating a IQueryable around indexConservedSource but it was even worse. I tried using a Parallel.ForEach with a ConcurrentBag for "ret", and it was also slower.

What else can be done?

Upvotes: 0

Views: 42

Answers (1)

Anton&#237;n Lejsek
Anton&#237;n Lejsek

Reputation: 6103

What you are doing, in principle, is calculating intersection of two sequences. You use two nested loops and that is slow as the time is O(m*n). You have two other options:

  1. sort both sequences and merge them
  2. convert one sequence into hash table and test the second against it

The second approach seems better for this scenario. Just convert those index lists into HashSet and test values against it. I added some code for inspiration:

private List<LoadedDataSource> GetAssociatedValues(IReadOnlyCollection<List<LoadedDataSource>> indexRows, List<LoadedDataSource> values)
{
    var ret = values;

    if ((ValueColumn.LinkKeys & ReportLinkKeys.ContainerId) > 0 &&
        ret.Any(t => t.ContainerId.HasValue))
    {
        var indexes = indexRows
            .Where(i => i.CheckContainer)
            .Select(i => new HashSet<int>(i
                .Where(h => h.ContainerId.HasValue)
                .Select(h => h.ContainerId.Value)))
            .ToList();

        ret = ret.Where(v => v.ContainerId == null 
                        || indexes.All(i => i.Contains(v.ContainerId)))
                 .ToList();
    }

    if ((ValueColumn.LinkKeys & ReportLinkKeys.EnterpriseId) > 0 &&
        ret.Any(t => t.EnterpriseId.HasValue))
    {
        var indexes = indexRows
            .Where(i => i.CheckEnterpriseId)
            .Select(i => new HashSet<int>(i
                .Where(h => h.EnterpriseId.HasValue)
                .Select(h => h.EnterpriseId.Value)))
            .ToList();

        ret = ret.Where(v => v.EnterpriseId == null 
                        || indexes.All(i => i.Contains(v.EnterpriseId)))
                 .ToList();
    }

    return ret;
}

Upvotes: 1

Related Questions