Reputation:
I am solving a problem which states that we have a list L containing integers from 1 to N. We have to perform the following operation N−1 times:
Constraints : 1≤N≤1,000,000
Time Limit : 1 sec
I have written this code which gives the correct answer in linear time but it says time limit exceeded for this approach. Can someone provide a better optimized solution
inline ull cal(ull x, ull y){
ull ans, i, modno;
modno = 1000000007;
i = 1;
ans = (x + y);
i = (i*x) % modno;
i = (i*y) % modno;
ans = ans + i;
ans = ans % modno;
return ans;
}
int main(){
ull n;
cin>>n;
ull sum, modno;
sum = 0;
modno = 1000000007;
if(n == 1)
cout<<1<<endl;
else
{
sum = n + (n-1) + (n*(n-1));
n -= 2;
do
{
if(n <= 0)
break;
sum = cal(sum, n);
n -= 1;
}while(1);
cout<<ans<<endl;
}
return 0;
}
Final code :
ull n;
cin>>n;
if(n == 1)
cout<<1<<endl;
else
{
ull modno = 1000000007;
ull ans = 1;
ull no = n+1;
while(no >= 1)
{
ans = (ans*no);
if(ans > modno)
ans = ans%modno;
no--;
}
ans = ans - 1;
ans = ans % modno;
cout<<ans<<endl;
Upvotes: 2
Views: 215
Reputation: 16068
As others have mentioned, the problem boils down to calculating ((n + 1)! - 1) % p. You can search around about fast methods of doing this (fast factorial modulo prime). One of those that would work under 1s is the one mentioned here
Update: Just checked the problem link from codechef. As usual, the trick lies in the constraints which you haven´t accurately described. You have to do the same task for up to 100000 cases. A single fact(n) mod p can be obtained in under 1 second using standard for loop, as n is small.
What won´t work is calculate fact(n) mod p for every test case. Like many other problems, you can benefit using precomputation: build an array where arr[i] is i! mod p up to i = max value n can take + 1. With this information, you can answer each query (test case) in O(1) by just returning (arr[n + 1] - 1) % p.
Just tried this and got accepted. Next time, please add problem link to your description, it is usually the case that you don´t think something is relevant and that part is the whole answer to the problem.
Upvotes: 1
Reputation: 37262
The problem just says "Choose two elements of the list, let's denote them by X and Y.
" and doesn't say anything about the order that the elements need to be chosen.
Therefore it could be rewritten as:
Split the list into one sub-list per CPU
Using SIMD; calculate (X+1)*(Y+1)
for each pair in each CPU's
sub-list and store the results in an new list as 64-bit integers so
that you can avoid doing the expensive modulo operation
Using SIMD; calculate (X*Y - 1) % 1000000007
for each pair in
each CPU's new sub-list and store the results as 32-bit integers.
Repeat the previous 2 steps until you're left with one value from
each CPU (and do the final R = (R - 1) % 1000000007
if necessary to bring it back to 32-bit). Store these
values in a list and terminate all threads except for one.
Using SIMD; calculate (X+1)*(Y+1)
for each pair
Using SIMD; calculate (X+*Y - 1) % 1000000007
for each pair
Repeat the previous 2 steps until you're left with one value
Upvotes: 0
Reputation: 19223
There's a closed-form solution for the sum: L = (N+1)!-1
The sum follows this recurrent equation L_N = N + L_(n-1) + N*L_(n-1), L_0=0
which can be obtained by simply always choosing X=L_(N-1)
and Y=N
( = the next number to add).
Derivation:
EDIT:
As you posted your final code, I'm posting my benchmark:
#include <iostream>
#include <cstdint>
#include <chrono>
std::uint64_t
factorial(std::uint64_t n) {
std::uint64_t x = 1;
while (n > 1)
x = (x * n--) % 1'000'000'007;
return x;
}
int
main() {
std::uint64_t n;
std::cin >> n;
std::uint64_t numMicro = 0;
for (std::size_t i = 0; i < 1'000; ++i) {
auto start = std::chrono::high_resolution_clock::now();
volatile std::uint64_t res = factorial(n);
auto end = std::chrono::high_resolution_clock::now();
numMicro +=
std::chrono::duration_cast<std::chrono::microseconds>(end - start)
.count();
}
std::cout << "On average: " << numMicro / 1000.0 << "microseconds";
return 0;
}
Compiled with -O3
, volatile
is there only to make sure that the compiler does not optimize the computation away.
Your solution is almost the same, way below the 1 second. Not sure what to optimize further.
Upvotes: 3
Reputation: 77053
The algorithm should look like this:
sum <- 1 for index <- 2,n sum = (sum + index + sum * index) mod 1000000007 end for
Explanation: since + and * are commutative and associative, the order in which the items are handled is irrelevant, so you are doing a good job implementing this cycle, but you unnecessarily overcomplicate your cal function.
The other answers tell you to calculate ((n + 1)! - 1) mod modno, which is correct if we forget about the modulo part, but I doubt that calculating ((n + 1)! - 1) mod modno will yield the very same result as computing this in a step-by-step manner regardless of the value of n, because we have + and * in each step. If the other answerers are correct, then you can greatly optimize your algorithm. If not, then optimizing this is not as easy.
Upvotes: 0