John Mc Murray
John Mc Murray

Reputation: 363

C++ Perform calculations on a huge array

I was asked a question for a job interview and I did not know the correct answer....

The question was:

If you have an array of 10 000 000 ints between 1 and 100, determine (efficiently) how many pairs of these ints sum up to 150 or less.

I don't know how to do this without a loop within a loop, but that is not very efficient.

Does anyone please have some pointers for me?

Upvotes: 3

Views: 898

Answers (3)

Open AI - Opting Out
Open AI - Opting Out

Reputation: 24133

These kind of questions always require a mixture of mathematical insight and efficient programming. They don't want brute force.

First Insight

Numbers can be grouped according to how they will pair with other groups.

Putting them into:

1 - 50 | 51 - 75 | 76 - 100
  A    |    B    |    C
  • Group A can pair with anything.
  • Group B can pair with A and B, and possibly C
  • Group C can pair with A and possibly B, but not C

The possibly is where we need some more insight.

Second Insight

For each number in B we need to check how many numbers there are up to its complement with 150. For example, with 62 from group B we want to know from group C how many numbers are less than or equal to 88.

For each number in C we add up the tallies up to it, e.g. tallies for 76, 77, 78, ..., 88. This is known mathematically as the partial sum.

In the standard library there is a function which produces a partial_sum

vector<int> tallies(25); // this is room for the tallies from C
vector<int> partial_sums(25);

partial_sum(tallies.begin(), tallies.end(), partial_sums.begin());

Symmetry means this sum only needs to be done for one group.

Third (much later) insight

Calculating the totals for group A and B can be done using partial_sum, too. So rather than only calculating for group C, and having to track the totals some other way, just store the totals for each number from 1 to 100, and then create the partial_sum over the whole thing. partial_sums[50] will give you the amount of numbers less than or equal to 50, partial_sums[75] those less than or equal to 75, and partial_sums[100] should be 10 million, i.e. all the numbers less than or equal to 100.

Finally we can calculate the combinations from B and C. We want to add together all the multiples of totals for 50 and 100, 51 and 99, 52 and 98, etc. we can do this by iterating through the tallies from 50 to 75 and the partial_sums from 100 to 75. There is a standard library function inner_product which can handle this.

This seems quite linear to me.

random_device rd;
mt19937 gen(rd());
uniform_int_distribution<> dis(1, 100);

vector<int> tallies(100);
for(int i=0; i < 10000000; ++i) {
    tallies[dis(gen)]++;
}

vector<int> partial_sums(100);
partial_sum(tallies.begin(), tallies.end(), partial_sums.begin());

int A = partial_sums[50];
int AB = partial_sums[75];
int ABC = partial_sums[100]; 
int B = AB - A;
int C = ABC - AB;

int A_match = A * ABC;
int B_match = B * B;
int C_match = inner_product(&tallies[50], &tallies[75],
                            partial_sums.rend(), 0);

Upvotes: 0

Tobias Langner
Tobias Langner

Reputation: 10808

first, I would sort the array. Then you start a single pass through the sorted array. You get the single value n in that cell and find the correspondent lowest value that is still allowed (e.g. for 15 it is 135). Now you find the index of this value in the array and that's the amount of pairs for n. Sum up all these and you have (if my mind is working correctly) counted each pair twice, so if you divide the sum by 2, you have the correct number.

The solution should be O(n log n) compared to the trivial one, which is O(n^2)

Upvotes: 0

thang
thang

Reputation: 3466

One way is by creating a smaller array of 100 elements. Loop through the 10,000,000 elements and count how many of each. Store the counter in the 100 element array.

    // create an array counter of 101 elements and set every element to 0
    for (int i = 0; i < 10000000; i++) {
          counter[input[i]]++;
    }

then do a second loop j from 1 to 100. inside that, have a loop k from 1 to min(150-j,j). if k!=j, add counter[j]*counter[k]. if k=j, add (counter[j]-1)*counter[j].

the total sum is your result.

Your total run time is bounded on the top by 10,000,000 + 100*100 = 10,010,000 (it's actually smaller than this).

This is a lot faster than (10,000,000)^2, which is 100,000,000,000,000.

Of course, you have to give up 101 int space in memory.

Delete counter when you're done.

Note also (as pointed out in the discussion below) that this is assuming that order matters. If order doesn't matter, just divide the result by 2.

Upvotes: 7

Related Questions