Ross Bush
Ross Bush

Reputation: 15185

Performant running SUM OVER Partition in LINQ

I am trying to figure out the best way to calculate a running sum partition with a self joined collection using LINQ.

The query below is a somewhat simple example of what I am after. The output is the RowNumber, the RowType and the sum of all preceding RowValues within the current row's RowType.

DECLARE @T TABLE (RowNumber INT, RowType INT, RowValue INT) 

INSERT @T VALUES (1,1,1),(2,1,1),(3,1,1),(4,1,1),(5,1,1),(6,2,1),(7,2,1),(8,2,1),(9,2,1),(10,2,1) 

;WITH Data AS(SELECT RowNumber, RowType,RowValue FROM @T)

SELECT
    This.RowNumber,
    This.RowType,
    RunningValue = COALESCE(This.RowValue + SUM(Prior.RowValue),This.RowValue)
FROM
    Data This
    LEFT OUTER JOIN Data Prior ON Prior.RowNumber <  This.RowNumber AND Prior.RowType = This.RowType
GROUP BY
    This.RowNumber,
    This.RowType,
    This.RowValue
/* OR
SELECT
    This.RowNumber,
    This.RowType,
    RunningValue = SUM(RowValue) OVER (PARTITION BY RowType ORDER BY RowNUmber)
FROM
    Data This
*/

Now, my not working attempt.

var joinedWithPreviousSums = allRows.Join(
    allRows,
    previousRows => new {previousRows.RowNumber, previousRows.RowType, previousRows.RowValue}, 
    row=> new { row.RowNumber, row.RowType, row.RowValue}, 
    (previousRows, row) => new { row.RowNumber, row.RowType, row.RowValue })
    .Where(previousRows.RowType == row.RowType && previousRows.RowNumber < row.RowNumber)
    .Select(row.RowNumber, row.RowType,RunningValue = Sum(previousRows.Value) + row.RowValue)).ToList()

Of course, the last two lines above are garbage and attempt to exemplify my desired projection while hinting at my lack of knowledge on performant complex LINQ projections.

I have read where some variation of the statement below could work and may be workable, however, is there a way to achieve similar results without yielding?

int s = 0;
var subgroup  = people.OrderBy(x => x.Amount)
                      .TakeWhile(x => (s += x.Amount) < 1000)
                      .ToList(); 

EDIT : I have been able to get the snippet below to work, however, I cant seem to partition or project over RowType.

namespace ConsoleApplication1
{
    class Program
    {
        delegate string CreateGroupingDelegate(int i);

        static void Main(string[] args)
        {
            List <TestClass> list = new List<TestClass>() 
               {
                    new TestClass(1, 1, 1),
                    new TestClass(2, 2, 5), 
                    new TestClass(3, 1, 1 ),
                    new TestClass(4, 2, 5),
                    new TestClass(5, 1, 1),
                    new TestClass(6, 2, 5)
            };
            int running_total = 0;

            var result_set = list.Select(x => new { x.RowNumber, x.RowType, running_total = (running_total = running_total + x.RowValue) }).ToList();


            foreach (var v in result_set)
            {
                Console.WriteLine("list element: {0}, total so far: {1}",
                    v.RowNumber,
                    v.running_total);
            }

            Console.ReadLine();
        }
    }

    public class TestClass
    {
        public TestClass(int rowNumber, int rowType, int rowValue)
        {
            RowNumber = rowNumber;
            RowType = rowType;
            RowValue = rowValue;
        }

        public int RowNumber { get; set; }
        public int RowType { get; set; }
        public int RowValue { get; set; }
    }

}

Upvotes: 1

Views: 1184

Answers (2)

NetMage
NetMage

Reputation: 26927

Your answer can be simplified greatly, but does scale poorly even then, as it must go through the Where for each row to compute each row, so O(list.Count^2).

Here is the simpler version, which preserves the original order:

var result = list.Select(item => new {
    RowType = item.RowType,
    RowValue = list.Where(prior => prior.RowNumber <= item.RowNumber && prior.RowType == item.RowType).Sum(prior => prior.RowValue)
});

You can go through list once if are willing to sort. (If you know the order is correct, or can use a simpler sort, you can remove or replace the OrderBy/ThenBy.)

var ans = list.OrderBy(x => x.RowType)
              .ThenBy(x => x.RowNumber)
              .Scan(first => new { first.RowType, first.RowValue },
                    (res, cur) => res.RowType == cur.RowType ? new { res.RowType, RowValue = res.RowValue + cur.RowValue }
                                                             : new { cur.RowType, cur.RowValue }
              );

This answer uses an extension method that is like Aggregate, but returns the intermediate results, based on the APL scan operator:

// TRes seedFn(T FirstValue)
// TRes combineFn(TRes PrevResult, T CurValue)
public static IEnumerable<TRes> Scan<T, TRes>(this IEnumerable<T> src, Func<T, TRes> seedFn, Func<TRes, T, TRes> combineFn) {
    using (var srce = src.GetEnumerator()) {
        if (srce.MoveNext()) {
            var prev = seedFn(srce.Current);

            while (srce.MoveNext()) {
                yield return prev;
                prev = combineFn(prev, srce.Current);
            }
            yield return prev;
        }
    }
}

Upvotes: 2

Ross Bush
Ross Bush

Reputation: 15185

My eyes were glazed over after seeing this. The answer to my long winded question after 6 hours of skulldrugery seems to be as simple as this. Thanks to @NetMage for pointing out the SelectMany that I was missing.

var result = list.SelectMany(item => list.Where(x => x.RowNumber <= item.RowNumber && x.RowType == item.RowType)
.GroupBy(g => g.RowType)
.Select(p => new
{
    RowType = p.Max(s => s.RowType),
    RowValue = p.Sum(s => s.RowValue)
}));

Upvotes: 0

Related Questions