John A
John A

Reputation: 35

Algorithm to flip N elements of an array to get sum to zero

Imagine you have an accounting report that consists of multiple hundreds of rows with the following structure:

AccountID Value.
1. -11775,61
2. -11538,40
3. 0,05
4. 26969,32
... ...
500. 1354,25

The total sum in a correct report must be exactly zero, not even a cent off.

The error always comes from some rows having incorrect signs, so we need to find them and flip the sign.

Task

Given a non-zero sum array, the task is to find which rows need to have their signs flipped so that the sum becomes 0. Correct answer is an array of values to be flipped.

Specifics:

My attempts

I decided to ask for guidance here as I'm feeling a bit lost in the direction of my research.

I've tried a sort + bruteforce solution:

This works on many cases, but fails when there are 2 islands or 1 island and 1-2 outlier rows far from the original island.

Research

In general, I found SSP and many different algorithms, but, obviously, O(2^(n/2)) on 500 rows is an impossible task. I found out that given this is a JS snippet to be run in client's browser — to be under 1 second of computation time, the algorithm can try ~2^22 combinations before it must stop.

I haven't tried going into dynamic programming methods to achieve pseudo-polynomial time here yet.

Expectations

I have a feeling, that, given these very specific restrictions on the task above — there should be something simpler.

The correct answers always are 1-2 islands + 1-2 outliers, never anything complex. 99% cases — 5-6 flips solve the sum. So "10 flips" is the worst case, in reality average is ~5.

Like there must be some specific approach I'm missing that could drastically lower the average case complexity by utilizing these restrictions.

Maybe I'm wrong.

Upvotes: 3

Views: 268

Answers (4)

trincot
trincot

Reputation: 350252

I think the approach with the bitmap is fine, in the sense that in a limited area you would look at all on-off combinations for flipping.

To support two islands like that, store the effect of each island configuration in a Map keyed by that amount-difference, and associated with that key store the leftmost index of that island together with the flip-bitmap.

Of course, if any of those islands solves the problem, return that solution.

If not, then iterate these islands, and determine which amount you need to combine it with to get the zero sum. Look in the map for that missing amount (which must be a key in that map). If found, you combine the two locations and return that result.

Some care can to be taken to not combine two overlapping islands, as you would always find an equivalent combination where the islands are disjunct.


To demonstrate this idea, I made a function that creates a random array with 500 amounts in the range of (-100 000,...,100 000), and which applies a maximum number of flips in at most 2 islands. I defined "island" as a set of flips where the two extreme indices are not more than 12 apart. So for example, an island can be 12 consecutive flips, or can be 2 flips at indices 3 and 15, or any variation in between.

The demo below will keep producing new data and calculate the solution for it. If the solution cannot be found within 1 second (which never happend on my PC), or the solution could not be found with at most 2 islands, the output is "not found". Otherwise it is an array with indices where the flips should be applied.

Each time the solution is verified by actually summing up the amounts taking into account the flips and asserting that this sum is zero.

Here is the snippet which keeps running random tests:

const randint = (end) => Math.floor(Math.random() * end);

function randomInput(size, maxFlips, maxIslands) {
    let amounts = [];
    let sum = 0;
    while (size-- > 1) {
        let amount = (Math.floor(Math.random() * 2e7) - 1e7) / 100;
        amounts.push(amount);
        sum += amount;
    }
    // Make sum zero
    while (Math.abs(sum) >= 100000) {
        let i = randint(amounts.length);
        if (amounts[i] * sum > 0) {
            sum -= amounts[i] * 2;
            amounts[i] = -amounts[i];
        }
    }
    amounts.push(-Math.round(sum * 100) / 100);

    // Prepare islands
    let islands = Array(maxIslands).fill(0);
    // Determine size of each island (could remain 0)
    while (maxFlips-- > 0) {
        islands[randint(maxIslands)]++;
    }
    // Apply the flips (could occasionally overlap, making the problem simpler)
    for (let count of islands) {
        let i = randint(amounts.length - count + 1);
        while (count-- > 0) {
            // 80% probability that amount is flipped; so we may get holes in islands
            if (randint(100) < 80) {
                amounts[i + count] = -amounts[i + count]; // Flip!
            }
        }
    }
    return amounts;
}

function solve(amounts) {
    let deadline = performance.now() + 900;
    const map = new Map;
    
    function memo(sum, loc) {
        if (map.has(sum)) map.get(sum).push(loc);
        else map.set(sum, [loc]);
    }

    function recur(i, j, bits, width, sum) {
        if (width == 0) return memo(sum, [i, bits]);
        recur(i, j + 1, bits * 2, width - 1, sum);
        recur(i, j + 1, bits * 2 + 1, width - 1, sum + amounts[j]*2);
    }
    
    function expand([i, bits]) {
        return Array.from(bits.toString(2), (bit, j) => +bit ? i + j : -1)
                    .filter(i => i >= 0);        
    }

    // Convert to cents
    amounts = amounts.map(val => Math.round(val * 100));
    
    let sum = amounts.reduce((a, b) => a + b);
    if (sum == 0) return []; // No flips
    
    const WIDTH = 12;
    // Collect islands with at least one flip (at i)
    for (let i = 0; i < amounts.length; i++) {
        recur(i, i + 1, 1, WIDTH - 1, amounts[i]*2);
    }
    // Solution with one island?
    if (map.has(sum)) return expand(map.get(sum)[0]);
    // Look for solutions with two islands...
    for (let [sum1, islands] of map) {
        if (map.has(sum - sum1)) {
            for (let [i, bits1] of map.get(sum - sum1)) {
                for (let [j, bits2] of islands) {
                    if (i >= j + WIDTH) return expand([j, bits2]).concat(expand([i, bits1]));
                    else if (j >= i + WIDTH) return expand([i, bits1]).concat(expand([j, bits2]));
                }
            }
        }
        if (performance.now() >= deadline) break;
    }
    return "not found";
}

function verify(amounts, flips) {
    let sum = Math.round(amounts.reduce((acc, val, i) => acc + Math.round(100*val) * (flips.includes(i) ? -1 : 1), 0));
    if (sum != 0) {
        console.log("amounts", JSON.stringify(amounts));
        console.log("flips", JSON.stringify(flips));
        throw "Wrong solution detected! sum=" + sum;
    }
}

// I/O handling
const input = document.querySelector("textarea");
const [output, time] = document.querySelectorAll("span");
const pause = document.querySelector("input");
let pausing = false;

setInterval(function repeat() {
    if (pausing) return;
    const amounts = randomInput(500, 12, 2);
    const start = performance.now();
    const flips = solve(amounts);
    time.textContent = Math.ceil(performance.now() - start);
    verify(amounts, flips);
    
    input.value = amounts;
    output.textContent = flips;
}, 1000);

pause.onchange = () => pausing = pause.checked;
textarea { width: 100%; height: 6em }
500 amounts: <textarea readonly></textarea><br>
Flips found: <span></span><br>
Time elapsed: <span></span> ms<br>
<input type="checkbox">Pausing

Note that a new test is executed every second. This is not the time it takes to produce the result. The time needed for the processing is displayed in the output.

Secondly, there are often many solutions. This algorithm doesn't look for the solution that needs the least flips, although it will give precedence to one-island solutions. But as there are often many solutions, the solution will often include the very first index, as that index is included in the first islands, which are combined first with all other islands.

Upvotes: 1

btilly
btilly

Reputation: 46409

I tried the dynamic programming thing. Performance is..not as desired. The problem is that the combinatorial explosion in possibilities goes too fast.

But if you only have 2-3 groups of flips, the following may be good enough?

function findFlips (accounts) {
  for (let i = 0; i < accounts.length; i++) {
    accounts[i]['cents'] = Math.round(accounts[i].value * 100);
  }

  let sum = accounts.reduce((a, b) => a+b.cents, 0);
  // -2 is a hack to make the start of the next group be 0.
  const start = {"sum": sum, "flips": 0, "i": -2, "decisions": 0, "prev": null};
  let best = {}
  best[start.sum] = [start];
  let todo = [start];

  let d = new Date;
  let startTime = d.getTime();
  let count = 0;

  while (0 < todo.length) {
    let current = todo.shift();
    if (current.sum == 0) {
      let answer = [];
      while (current.prev != null) {
        answer.unshift(current.i);
        current = current.prev;
      }
      return answer;
    }

    // Did we take too long?
    count++;
    if (0 == count%10000) {
      d = new Date;
      if (999 < d.getTime() - startTime) {
        return null;
      }
    }

    for (let i = current.i+2; i < accounts.length; i++) {
      let next = {
        "sum": current.sum - 2*accounts[i].cents,
        "flips": current.flips + 1,
        "i": i,
        "decisions": current.decisions+1,
        "prev": current};
      let j = i;
      while (j < accounts.length && next.flips < 11 && next.decisions < 5) {
        let existing = (best[next.sum] || [])
        let k = 0;
        while (k < existing.length) {
          if (existing[k].i <= next.i && existing[k].flips <= next.flips) {
            break;
          }
          k++;
        }
        if (k == existing.length) {
          existing.push(next);
          best[next.sum] = existing;
          todo.push(next);
        }
        j++;
        if (j < accounts.length && j < i + 4) {
          let next2 = next;
          next = {
            "sum": next2.sum - 2*accounts[j].cents,
            "flips": next2.flips + 1,
            "i": j,
            "decisions": next2.decisions,
            "prev": next2};
        }
      }
    }
  }
  return null;
}

let accounts = [
  {'accountId': 1, 'value': 24.12},
  {'accountId': 2, 'value': 128.16},
  {'accountId': 3, 'value': -545.43},
  {'accountId': 4, 'value': 191.92},
  {'accountId': 5, 'value': 238.01},
  {'accountId': 6, 'value': -500.36},
  {'accountId': 7, 'value': -81.22},
  {'accountId': 8, 'value': 132.50},
  {'accountId': 9, 'value': 246.80},
  {'accountId': 10, 'value': 93.80},
  {'accountId': 11, 'value': 389.00},
  {'accountId': 12, 'value': -331.00},
  {'accountId': 13, 'value': -332.00},
  {'accountId': 14, 'value': 321.00},
  {'accountId': 15, 'value': 321.06},
  {'accountId': 16, 'value': -336.04},
  {'accountId': 17, 'value': -491.02},
  {'accountId': 18, 'value': -119.06},
  {'accountId': 19, 'value': 245.69},
  {'accountId': 20, 'value': 404.07},
];

console.log(findFlips(accounts));
// Do some flips.
[2, 3, 11, 18].forEach((i) => accounts[i].value = - accounts[i].value);

console.log(findFlips(accounts));

accounts[2].value += 0.03;
console.log(findFlips(accounts));

Upvotes: 0

גלעד ברקן
גלעד ברקן

Reputation: 23955

Since there is no requirement to make a specific number of flips, we can store only the minimal number of flips necessary to create a sum up to the ith element,

for each element:
  for each sum stored in the previous round:
    if the number of its flips is less than 10:
      make a new record of sum with flipped value
      if this sum can now be associated with
      less flips, update that
    make a new record of sum with given value
    if this sum can now be associated with
    less flips, update that
    

Upvotes: 0

ADdV
ADdV

Reputation: 1252

Comparison to subset sum

As you rightfully point out, the general case of your problem is equivalent to subset sum, which is NP-complete. The conversion is simple, we know how far away from 0 we are, so we have to find a subset of rows with value equal to exactly half that difference. Flipping those will make all data sum to zero.

However, given the general (constraints on the) form of the solution, we can come up with a solution that may be able to usually find a solution in reasonable time.

Constrained solution

First off, you should use integers representing cents (by multiplying by 100) so we don't have to deal with floating points.

Now, we can compute and store every value we can make with at most 2 of the values. This is only 500^2 combinations, which is easily fast enough. We store these in a datastructure that allows for O(1) lookup, such as a hashmap, and we also store the indices we used.

Then we iterate over all contiguous "islands" of at most, say, size 10. There are about 10*500 of these, which is still not a problem. You could even completely relax this constraint and simply look at all ~500*500/2 islands. We subtract the sum of these islands from the target number to find the amount we still need to flip, and check the hashmap if that value is possible to make with at most 2 rows. If it is, we finally need to check if the latter rows are included within the island, and reject the solution in that case.

This strategy allows for efficient search over all possible combinations of an island and one or two stray flips.

Upvotes: 0

Related Questions