Tomas Pajonk
Tomas Pajonk

Reputation: 5202

What is a good datastructure to keep cumulative values in?

I am looking for an idea, concept or proven datastructure that would be very efficient when accessing a collection that keeps cumulative values.

Example may shed more light on my need:

I have a list of values (2,3,5). This list when looking at the cumulative values would be (2,5,10).

I will now add 1 at the start of the list and get (1,2,3,5) and in cumulative terms (1,3,6,11).

I only need to look at the cumulative values, I am not at all interested in the 1,2,3,5. I need to be able to quickly insert at position, remove a position and all this should quickly update the cumulative array (ideally without iterating throughout the whole array and recalculating the values.

Any ideas or hints ?

@Kristo (too long to put in the commentary): To clarify why negative numbers would make the total sum value meaningless please follow this example.

Insert 1 followed by -1. Sum is 1 than 0. (1,-1) // (1,0) Insert 3 followed by insert -3. Sum is 3 then 0. (1,3,-1,-3) // (1,4,3,0) Insert 2 followed by insert -2. Sum is 2 then 0. (1,3,2,-1,-2,-3) // (1,4,6,5,3,0)

If my "magic number" were 4 total sum wouldn't tell me whether I've exceeded it.

PS: The main reason for this is to be able to tell if I went over a certain value and where in the chain.

Upvotes: 11

Views: 3596

Answers (8)

Alexandre Bouchard
Alexandre Bouchard

Reputation: 1

Use a https://en.wikipedia.org/wiki/Fenwick_tree.

This will have expected runtime complexity growing logarithmically in the number of elements instead of linearly as in the naive implementation.

Upvotes: 0

bill
bill

Reputation: 1361

  • You can look at a data structure for cumulative frequency the Binary Indexed Trees
  • You can break your value range into fixed bit ranges. Ex. 3 intervals:

    #define NUM (1<<24)  // max value in your data set
    #define BITS0 8
    #define BITS1 8
    int cum0[NUM >> (BITS0+BITS1)]; // sum of cum1
    int cum1[NUM >> BITS1]; // sum of count
    int count[NUM];
    
    int add(id, val) { // add a value
      cum0[id >> (BITS0+BITS1)] += val;
      cum1[id >> BITS1] += val; 
      count[id] += val;                     
    }
    
    int cumvalue(int id) { int cum = 0; // return cum value at index id         
      for(i = 0; i < (id >> (BITS0+BITS1));i++) cum += cum0[i]; i <<= BITS0;
      for(i = (id & ~((1 << (BITS0+BITS1))-1)) >> BITS1; i < (id >> BITS1); i++) cum+= cum1[i]; i <<= BITS1;
      for(i = id & ~((1 << BITS1) -1); i < id; i++) cum += count[i];            
      return cum;
    }
    

Upvotes: 1

Michael Kristofik
Michael Kristofik

Reputation: 35188

Using C++ terms, could you get away with a std::list (easy inserts/removes in the middle) or std::set (always sorted) for the data and one variable to hold the sum? On every insert/remove, you modify the sum as appropriate. The sum represents the highest number in your would-be cumulative list. Only when you bust your magic number do you need to do some algorithmic work to figure out where you busted.

Update:

Based on your new information, I don't see many shortcuts available. You need to insert or remove frequently from the middle, so that suggests some sort of linked list approach. You can save a little bit of calculation by only updating the portions of the list that have changed. Let L be the list of values and n be the desired location in the list. To insert value x at location n:

  • Insert the value x + L(n-1) at location n
  • Add x to all elements after this new n
  • Stop if you bust your magic number

The procedure is the same for a removal, except you subtract from all subsequent values. This way, you're only doing a lot of work if you insert near the beginning.

Upvotes: 1

Craig Gidney
Craig Gidney

Reputation: 18296

Use a binary search tree with the additional property that nodes contain the sum of their subtree. All the operations are still O(lg n). To insert or remove a value you do the normal procedure, and also update the sums of all parents. Getting the sum is as easy as finding the node containing the element and returning its sum minus its right child's sum.

Upvotes: 5

Dolphin
Dolphin

Reputation: 4772

The only optimization I can think of is to do a 'lazy' evaluation of the cumulative list.

in addition to your list of source values, keep track of a number of the highest position in the cumulative list that is accurate. if you need a number higher than that then you walk up the list, updating the values, and the index.

idx  values       cumulative    operation
 3   (2,3,5)      (2, 5, 10)
 0   (1,2,3,5)    (X,X,X,X)     insert 1 at 0 
 3   (1,2,3,5)    (1,3,6,X)     look for value over 5     
 3   (1,2,3,5,4)  (1,3,6,X,X)   insert 4 at 4 

if course, this wont do you a whole lot of good if you are usually adding items early in the list....

Upvotes: 5

SergioL
SergioL

Reputation: 4963

In C#, I would keep all the actual values in a list, and use a custom iterator to loop through the cumulative values.

You'll only recalculate up to the point where the iterator tells you that you've past your limit (obviously, you'd have to code for that).

I think the value is that you can add/remove without doing any calculations until it's time to iterate over the list (which I think you need to do anyway to find the cut-off number).

Upvotes: 3

Osama Al-Maadeed
Osama Al-Maadeed

Reputation: 5695

There are two simple ways that I see, both use a basic data types - lists.

  1. Keep the original list, and recalculate the cumulatives on every change.

  2. Only keep the cumulative list, and only add to it or delete by using functions like:

    • Add(item,position default is end of list) would add the value of the item starting from position-1.
    • Delete(position) would calculate the original value be subtracting two numbers, then decreases this number from the rest of the list before deleting the item.

    Add 2 : (2) adds 2 to the empty list.

    Add 3 : (2,5) adds 3 at the end of the list to the prior element (2).

    Add 5 : (2,5,10) adds 5 at the end of the list to the prior element (5).

    Add 1 at start: (1,3,6,11) adds 1 at the start of the list, and increments by 1 till the end (no prior elements).

    Add 7 at 2nd position: (1,8,11,14,19) adds 7 and increments by 7 till the end (no prior elements).

    Delete 3rd position (The 11) : (1,8,3,8) get the value, delete it, add the value to the rest.

This way would keep it all synchronized without keeping the original values.

Upvotes: 1

Igor Krivokon
Igor Krivokon

Reputation: 10275

Check the cumulative frequency table.

Upvotes: 5

Related Questions