Reputation: 434
I have to find the sum of the first 4 digits, the sum of the last 4 digits and compare them (of all the numbers betweem m and n). But when I submit my solution online there's a problem with the time limit.
Here's my code:
#include <stdio.h>
int main()
{
int M, N, res = 0, cnt, first4, second4, sum1, sum2;
scanf("%d", &M);
scanf("%d", &N);
for(cnt = M; cnt <= N; cnt++)
{
first4 = cnt % 10000;
sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
second4 = cnt / 10000;
sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;
if(sum1 == sum2)
res++;
}
printf("%d", res);
return 0;
}
I'm trying to find a more efficient way to do this.
Upvotes: 1
Views: 3532
Reputation: 1434
I am going to try an approach which doesn't make use of the lookup table (even though I know that the second one should be faster) to investigate how much we can speedup just optimizing calculus. This algorithm can be used where stack is an important resource...
Let's work on the idea that divisions and modulus are slow, for example in cortex R4 a 32 bit division requires up to 16 loops while a multiplication can be done in a single loop, with older ARMs things can be even worse.
This basic idea will try to get rid of them using digit arrays instead of integers. To keep it simple let's show an implementation using printf before a pseudo optimized
version.
void main() {
int count=0;
int nmax;
char num[9]={0};
int n;
printf( "Insert number1 ");
scanf( "%d", &nm );
printf( "Insert number2 ");
scanf( "%d", &nmax );
while( nm <= nmax ) {
int sumup=0, sumdown=0;
sprintf( num, "%d", nm );
for( n=0; n<4; n++ ) {
sumup += num[n] -'0'; // subtracting '0' is not necessary (see below)
sumdown += num[7-n]-'0'; // subtracting '0' is not necessary (see below)
}
if( sumup == sumdown ) {
/* whatever */
count++;
}
nm++;
}
}
You may want to check that the string is a valid number using strtol
before calling the for loop and the length of the string using strlen
. I set here fixed values as you required (I assume length always 8).
The downside of the shown algorithm is the sprintf
for any loop that may do thing worse... So we apply two major changes
Finally the pseudo optimized algorithm should look something like the one shown below in which all divisions and modules are removed (apart from the first number) and bytes are used instead of ASCII.
void pseudo_optimized() {
int count=0;
int nmax,nm;
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
printf( "Insert number1 ");
scanf( "%d", &nm );
printf( "Insert number2 ");
scanf( "%d", &nmax );
n = nm;
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
while( nm <= nmax ) {
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
if( sumup == sumdown ) {
/* whatever */
count++;
}
nm++;
/* Following loop is a faster sprintf replacement and
* it will exit at the first value 9 times on 10
*/
for( i=7; i>=0; i-- ) {
if( num[i] == 9 ) {
num[i]=0;
} else {
num[i] += 1;
break;
}
}
}
}
Original algo on my vm 5.500000 s, this algo 0.950000 s tested for [00000000=>99999999] The weak point of this algorithm is that it uses sum of digits (which are not necessary and a for...loop that can be unrolled.
* update * further optimization. The sums of digits are not necessary.... thinking about it I could improve the algorithm in the following way:
int optimized() {
int nmax=99999999,
int nm=0;
clock_t time1, time2;
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
int count=0;
n = nm;
time1 = clock();
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
while( nm <= nmax ) {
if( sumup == sumdown ) {
count++;
}
nm++;
for( i=7; i>=0; i-- ) {
if( num[i] == 9 ) {
num[i]=0;
if( i>3 )
sumdown-=9;
else
sumup-=9;
} else {
num[i] += 1;
if( i>3 )
sumdown++;
else
sumup++;
break;
}
}
}
time2 = clock();
printf( "Final-now %d %f\n", count, ((float)time2 - (float)time1) / 1000000);
return 0;
}
with this we arrive to 0.760000 s which is 3 times slower than the result achieved on the same machine using lookup tables.
* update* Optimized and unrolled:
int optimized_unrolled(int nm, int nmax) {
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
int count=0;
n = nm;
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
while( nm <= nmax ) {
if( sumup == sumdown ) {
count++;
}
nm++;
if( num[7] == 9 ) {
num[7]=0;
if( num[6] == 9 ) {
num[6]=0;
if( num[5] == 9 ) {
num[5]=0;
if( num[4] == 9 ) {
num[4]=0;
sumdown=0;
if( num[3] == 9 ) {
num[3]=0;
if( num[2] == 9 ) {
num[2]=0;
if( num[1] == 9 ) {
num[1]=0;
num[0]++;
sumup-=26;
} else {
num[1]++;
sumup-=17;
}
} else {
num[2]++;
sumup-=8;
}
} else {
num[3]++;
sumup++;
}
} else {
num[4]++;
sumdown-=26;
}
} else {
num[5]++;
sumdown-=17;
}
} else {
num[6]++;
sumdown-=8;
}
} else {
num[7]++;
sumdown++;
}
}
return count;
}
Unrolling vectors improves the speed of about 50%. The algorithm costs now 0.36000 s, by the way it makes use of the stack a bit more than the previous solution (as some 'if' statements may result in a push, so it cannot be always used). The result is comparable with Alg2@Michael Burr on the same machine, [Alg3-Alg5]@Michael Burr are a lot faster where stack isn't a concern.
Note all test where performed on a intel VMS. I will try to run all those algos on a ARM device if I will have time.
Upvotes: 2
Reputation: 699
Finally, if you are still interested, there is a much faster way to do this. Your task doesn't specifically require you to calculate the sums for all the numbers, it only asks for the number of some special numbers. In such cases optimization techniques like memoization or dynamic programming come really handy.
In this case, when you have the first four digits of some number (let them be 1234), you calculate their sum (in this case 10) and you immediately know, what is the sum of the other four digits supposed to be.
Any 4-digit number, that yields sum 10 can now be the other half to create a valid number. Therefore total number of valid numbers beginning with 1234 is exactly the number of all four digit numbers that give the sum 10.
Now consider another number, say 3412. This number has also sum equal to 10, therefore any right-side that completes 1234 also completes 3412.
What this means is that the number of valid numbers beginning with 3412 is the same as the number of valid numbers beginning with 1234, which is in turn the same as the total number of valid numbers, where the first half yields the sum 10.
Therefore if we precompute for each i
the number of four digit numbers
that yield the sum i
, we would know for each first four digits the exact number of
combinations of last four digits that complete a valid number,
without having to iterate over all 10000 of them.
The following implementation of this algorithm
This algorithm runs quadratically faster than the naive implementation (for sufficiently big N-M
).
#include <string.h>
int sum_digits(int number) {
return number%10 + (number/10)%10 + (number/100)%10 + (number/1000)%10;
}
int count(int M, int N) {
if (M > N) return 0;
int ret = 0;
int tmp = 0;
// for each i from 0 to 36 precompute number of ways we can get this sum
// out of a four-digit number
int A[37];
memset(A, 0, 37*4);
for (int i = 0; i <= 9999; ++i) {
++A[sum_digits(i)];
}
// nearest multiple of 10000 greater than M
int near_M = ((M+9999)/10000)*10000;
// nearest multiple of 10000 less than N
int near_N = (N/10000)*10000;
// count all numbers up to first multiple of 10000
tmp = sum_digits(M/10000);
if (near_M <= N) {
for (int i = M; i < near_M; ++i) {
if (tmp == sum_digits(i % 10000)) {
++ret;
}
}
}
// count all numbers between the 10000 multiples, use the precomputed values
for (int i = near_M / 10000; i < near_N / 10000; ++i) {
ret += A[sum_digits(i)];
}
// count all numbers after the last multiple of 10000
tmp = sum_digits(N / 10000);
if (near_N >= M) {
for (int i = near_N; i <= N; ++i) {
if (tmp == sum_digits(i % 10000)) {
++ret;
}
}
}
// special case when there are no multiples of 10000 between M and N
if (near_M > near_N) {
for (int i = M; i <= N; ++i) {
if (sum_digits(i / 10000) == sum_digits(i % 10000)) {
++ret;
}
}
}
return ret;
}
EDIT: I fixed the bugs mentioned in the comments.
Upvotes: 5
Reputation: 40145
#include <stdio.h>
int main(){
int M, N;
scanf("%d", &M);
scanf("%d", &N);
static int table[10000] = {0,1,2,3,4,5,6,7,8,9};
{
register int i=0,i1,i2,i3,i4;
for(i1=0;i1<10;++i1)
for(i2=0;i2<10;++i2)
for(i3=0;i3<10;++i3)
for(i4=0;i4<10;++i4)
table[i++]=table[i1]+table[i2]+table[i3]+table[i4];
}
register int cnt = M, second4 = M % 10000;
int res = 0, first4 = M / 10000, sum1=table[first4];
for(; cnt <= N; ++cnt){
if(sum1 == table[second4])
++res;
if(++second4>9999){
second4 -=10000;
if(++first4>9999)break;
sum1 = table[first4];
}
}
printf("%d", res);
return 0;
}
Upvotes: 1
Reputation: 340198
I don't know if this would be significantly faster or not, but you might try breaking the number into two 4 digit numbers, then use a table lookup to get the sums. That way there's only one division operation instead of eight.
You can pre-compute the table of 10000 sums so it gets compiled in so there's no runtime cost at all.
Another slightly more complicated, but probably much faster, approach that can be used is have a table or map of 10000 elements that's the reverse of the sum lookup table where you can map the sum to the set of four digit numbers that would produce that sum. That way, when you have to find the result for a particular range 10000 number range, it's a simple lookup on the sum of the most significant four digits. For example, to find the result for the range 12340000 - 12349999, you could use a binary search on the reverse lookup table to quickly find how many numbers in the range 0 - 9999 have the sum 10 (1 + 2 + 3 + 4).
Again - this reverse sum lookup table can be pre-computed and compiled in as a static array.
In this way, the results for complete 10000 number ranges are performed with a couple binary searches. Any partial ranges can also be handled with the reverse lookup table with slightly more complication due to having to ignore matches that are from out of the range of interest. But that complication only has to happen at most twice for your whole set of subranges.
This would reduce the complexity of the algorithm from O(N*N) to O(N log N) (I think).
update:
Here's some timings I got (Win32-x86, using VS 2013 (MSVC 12) with release build default options):
range range
start end count time
================================================
alg1(10000000, 99999999): 4379055, 1.854 seconds
alg2(10000000, 99999999): 4379055, 0.049 seconds
alg3(10000000, 99999999): 4379055, 0.001 seconds
with:
alg1()
is the original code from the questionalg2()
is my first cut suggestion (lookup precomputed sums)alg3()
is the second suggestion (binary search lookup of sum matches using a table sorted by sums)I'm actually surprised at the difference between alg1()
to alg2()
Upvotes: 5
Reputation: 141
i found faster algorithm:
#include <stdio.h>
#include <ctime>
int main()
{
clock_t time1, time2;
int M, N, res = 0, cnt, first4, second4, sum1, sum2,last4_ofM,first4_ofM,last4_ofN,first4_ofN,j;
scanf("%d", &M);
scanf("%d", &N);
time1 = clock();
for(cnt = M; cnt <= N; cnt++)
{
first4 = cnt % 10000;
sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
second4 = cnt / 10000;
sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;
if(sum1 == sum2)
res++;
}
time2 = clock();
printf("%d\n", res);
printf("first algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
res=0;
time1 = clock();
first4_ofM = M / 10000;
last4_ofM = M % 10000;
first4_ofN = N / 10000;
last4_ofN = N % 10000;
for(int i = first4_ofM; i <= first4_ofN; i++)
{
sum1 = i % 10 + (i / 10) % 10 + (i / 100) % 10 + (i / 1000) % 10;
if ( i == first4_ofM )
j = last4_ofM;
else
j = 0;
while ( j <= 9999)
{
sum2 = j % 10 + (j / 10) % 10 + (j / 100) % 10 + (j / 1000) % 10;
if(sum1 == sum2)
res++;
if ( i == first4_ofN && j == last4_ofN ) break;
j++;
}
}
time2 = clock();
printf("%d\n", res);
printf("second algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
return 0;
}
i just dont need to count sum of the first four digits all the time the number in changed. I need to count it one time per 10000 iterations. In worst case output is:
10000000
99999999
4379055
first algorithm time: 5.160000
4379055
second algorithm time: 2.240000
about half the better result.
Upvotes: 0
Reputation: 46365
You are going about this the wrong way. A little bit of cleverness is worth a lot of horsepower. You should not be comparing the first and last four digits of every number.
First - notice that the first four digits will change very slowly - so for sure you can have a loop of 10000 of the last four digits without re-computing the first sum.
Second - the sum of digits repeats itself every 9th number (until you get overflow). This is the basis of the "number is divisible by 9 if sum of digits is divisible by 9". example:
1234 - sum = 10
1234 + 9 = 1243 - sum is still 10
What this means is that the following will work pretty well (pseudo code):
take first 4 digits of M, find sum (call it A)
find sum of last four digits of M (call it B)
subtract: C = (A - B)
If C < 9:
D = C%9
first valid number is [A][B+D]. Then step by 9, until...
You need to think a bit about the "until", and also about what to do when C >= 9. This means you need to find a zero in B and replace it with a 9, then repeat the above.
If you want to do nothing else, then see that you don't need to re-compute the sum of digits that did not change. In general when you add 1 to a number, the sum of digits increases by 1 (unless there is carry - then it decreases by 9; and that happens every 9th, 99th (twice -> sum drops by 18), 999th (drop by 27), etc.
I hope this helps you think about the problem differently.
Upvotes: 3
Reputation: 6731
If you know that the numbers are fixed like that, then you can you substring functions to get the components and compare them. Otherwise, your modulator operations are contributing unnecessary time.
Upvotes: 0