Reputation:

An optimized algorithm for the given problem?

I am solving a problem which states that we have a list L containing integers from 1 to N. We have to perform the following operation N−1 times:

Choose two elements of the list, let's denote them by X and Y.
Erase the chosen elements from L.
Append the number X + Y + X*Y to L. At the end, L contains exactly one integer. Find this integer. As the answer may be large, we have to compute it modulo 10^9 + 7

Constraints : 1≤N≤1,000,000

Time Limit : 1 sec

I have written this code which gives the correct answer in linear time but it says time limit exceeded for this approach. Can someone provide a better optimized solution

inline ull cal(ull x, ull y){
  ull ans, i, modno;
  modno = 1000000007;
  i = 1;

  ans = (x + y);
  i = (i*x) % modno;
  i = (i*y) % modno;
  ans = ans + i;
  ans = ans % modno;
  return ans;
}

int main(){
    ull n;
    cin>>n;

    ull sum, modno;
    sum = 0;
    modno = 1000000007;

    if(n == 1)
        cout<<1<<endl;
    else
    {
        sum = n + (n-1) + (n*(n-1));
        n -= 2;
        do
        {
            if(n <= 0)
                break;

            sum = cal(sum, n);
            n -= 1;
        }while(1);
        cout<<ans<<endl;
     }

   return 0;
}

Final code :

    ull n;
    cin>>n;

    if(n == 1)
        cout<<1<<endl;
    else
    {
        ull modno = 1000000007;
        ull ans = 1;
        ull no = n+1;

        while(no >= 1)
        {
            ans = (ans*no);
            if(ans > modno)
                ans = ans%modno;
            no--;
        }

        ans = ans - 1;
        ans = ans % modno;
        cout<<ans<<endl;

Upvotes: 2

Answers (4)

juvian

Reputation: 16068

As others have mentioned, the problem boils down to calculating ((n + 1)! - 1) % p. You can search around about fast methods of doing this (fast factorial modulo prime). One of those that would work under 1s is the one mentioned here

Update: Just checked the problem link from codechef. As usual, the trick lies in the constraints which you haven´t accurately described. You have to do the same task for up to 100000 cases. A single fact(n) mod p can be obtained in under 1 second using standard for loop, as n is small.

What won´t work is calculate fact(n) mod p for every test case. Like many other problems, you can benefit using precomputation: build an array where arr[i] is i! mod p up to i = max value n can take + 1. With this information, you can answer each query (test case) in O(1) by just returning (arr[n + 1] - 1) % p.

Just tried this and got accepted. Next time, please add problem link to your description, it is usually the case that you don´t think something is relevant and that part is the whole answer to the problem.

Upvotes: 1

Brendan

Reputation: 37262

The problem just says "Choose two elements of the list, let's denote them by X and Y." and doesn't say anything about the order that the elements need to be chosen.

Therefore it could be rewritten as:

Split the list into one sub-list per CPU
Using SIMD; calculate (X+1)*(Y+1) for each pair in each CPU's sub-list and store the results in an new list as 64-bit integers so that you can avoid doing the expensive modulo operation
Using SIMD; calculate (X*Y - 1) % 1000000007 for each pair in each CPU's new sub-list and store the results as 32-bit integers.
Repeat the previous 2 steps until you're left with one value from each CPU (and do the final R = (R - 1) % 1000000007 if necessary to bring it back to 32-bit). Store these values in a list and terminate all threads except for one.
Using SIMD; calculate (X+1)*(Y+1) for each pair
Using SIMD; calculate (X+*Y - 1) % 1000000007 for each pair
Repeat the previous 2 steps until you're left with one value

Upvotes: 0

Quimby

Reputation: 19223

There's a closed-form solution for the sum: L = (N+1)!-1

The sum follows this recurrent equation L_N = N + L_(n-1) + N*L_(n-1), L_0=0 which can be obtained by simply always choosing X=L_(N-1) and Y=N ( = the next number to add).

Derivation:

EDIT:

As you posted your final code, I'm posting my benchmark:

#include <iostream>
#include <cstdint>
#include <chrono>

std::uint64_t
factorial(std::uint64_t n) {
    std::uint64_t x = 1;
    while (n > 1)
        x = (x * n--) % 1'000'000'007;
    return x;
}
int
main() {
    std::uint64_t n;
    std::cin >> n;
    std::uint64_t numMicro = 0;
    for (std::size_t i = 0; i < 1'000; ++i) {
        auto start = std::chrono::high_resolution_clock::now();
        volatile std::uint64_t res = factorial(n);
        auto end = std::chrono::high_resolution_clock::now();
        numMicro +=
            std::chrono::duration_cast<std::chrono::microseconds>(end - start)
                .count();
    }
    std::cout << "On average: " << numMicro / 1000.0 << "microseconds";
    return 0;
}

Compiled with -O3, volatile is there only to make sure that the compiler does not optimize the computation away. Your solution is almost the same, way below the 1 second. Not sure what to optimize further.

Upvotes: 3

Lajos Arpad

Reputation: 77053

The algorithm should look like this:

sum <- 1 for index <- 2,n sum = (sum + index + sum * index) mod 1000000007 end for

Explanation: since + and * are commutative and associative, the order in which the items are handled is irrelevant, so you are doing a good job implementing this cycle, but you unnecessarily overcomplicate your cal function.

The other answers tell you to calculate ((n + 1)! - 1) mod modno, which is correct if we forget about the modulo part, but I doubt that calculating ((n + 1)! - 1) mod modno will yield the very same result as computing this in a step-by-step manner regardless of the value of n, because we have + and * in each step. If the other answerers are correct, then you can greatly optimize your algorithm. If not, then optimizing this is not as easy.

Upvotes: 0

An optimized algorithm for the given problem?

Answers (4)

Related Questions