Nathaniel Fleming
Nathaniel Fleming

Reputation: 43

Issues factoring large number that is 99 digits long

The number is

112887987371630998240814603336195913423482111436696007401429072377238341647882152698281999652360869

My code is below

def getfactors(number):
    factors = []
    for potentialFactor in range(1 , int(math.sqrt(number)) + 1):
        if number % potentialFactor == 0:
            factors.append(potentialFactor)
    return factors    

and the input is

getfactors(112887987371630998240814603336195913423482111436696007401429072377238341647882152698281999652360869)

The program has been running for at least 3 hours and I still have no results from it yet. The code works with other numbers too. Is there any algorithm or method that I could use to speed this up?

Upvotes: 3

Views: 852

Answers (3)

Joseph Wood
Joseph Wood

Reputation: 7607

I have authored a library for the R programming language called RcppBigIntAlgos that can factor these types of numbers in a reasonable amount of time.

As others have pointed out, your number is going to be very difficult to factor by any program you roll yourself unless you invest a considerable amount of time into it. For me personally, I have invested thousands of hours. Your mileage may vary.

Here is my test run in a vanilla R console (no special IDE):

numDig99 <- "112887987371630998240814603336195913423482111436696007401429072377238341647882152698281999652360869"
## install.packages("RcppBigIntAlgos") if necessary
library(RcppBigIntAlgos)

prime_fac <- quadraticSieve(numDig99, showStats = TRUE, nThreads = 8)

Summary Statistics for Factoring:
#>     112887987371630998240814603336195913423482111436696007401429072377238341647882152698281999652360869
#> 
#> |      MPQS Time     | Complete | Polynomials |   Smooths  |  Partials  |
#> |--------------------|----------|-------------|------------|------------|
#> |  6h 37m 48s 335ms  |   100%   |   11591331  |    8768    |    15707   |
#> 
#> |  Mat Algebra Time  |    Mat Dimension   |
#> |--------------------|--------------------|
#> |      27s 745ms     |    24393 x 24475   |
#> 
#> |     Total Time     |
#> |--------------------|
#> |  6h 38m 17s 401ms  |

And just as @kelalaka, we obtain the same hashed values (Again, ran right in the R console):

system(sprintf("printf %s | openssl sha512", prime_fac[1]))
#> faebc6b3645d45f76c1944c6bd0c51f4e0d276ca750b6b5bc82c162e1e9364e01aab42a85245658d0053af526ba718ec006774b7084235d166e93015fac7733d

system(sprintf("printf %s | openssl sha512", prime_fac[2]))
#> 27c64b5b944146aa1e40b35bd09307d04afa8d5fa2a93df9c5e13dc19ab032980ad6d564ab23bfe9484f64c4c43a993c09360f62f6d70a5759dfeabf59f18386

Also, to show that the factorization is indeed correct:

## The product of the result is indeed the original number
prod(prime_fac) == numDig99
#> [1] TRUE

## There are two factors
length(prime_fac)
#> [1] 2

## They are prime. The gmp package uses the Miller-Rabin primality test
gmp::isprime(prime_fac)
#> [1] 1 1

The algorithm implemented in RcppBigIntAlgos::quadraticSieve is the Multiple Polynomial Quadratic Sieve. There is a more efficient version of the quadratic sieve known as the Self Initializing Quadratic Sieve, however, there isn't as much literature in the wild available.

Here are my machine specs:

MacBook Air (2022)
Processor: Apple M2
Memory; 24 GB

And here is my R info:

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.4.1
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: America/New_York
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] RcppBigIntAlgos_1.1.0
#> 
#> loaded via a namespace (and not attached):
#> [1] compiler_4.3.1 gmp_0.7-2

Upvotes: 1

kelalaka
kelalaka

Reputation: 5636

Your method will take a lot of time to factor in the given number since the RSA primes are close to each other. Even sieving with Sieve of Eratosthenes won't help, since you have a 326-bit number. Can you sieving to 163-bit, there is no way. This is slightly larger than the first RSA challenge RSA-100 that has 300-bit.

Use existing libraries like the


The experiments

  1. I have tried with Pollard's p-1 algorithm, still running for one and a half-day and did not produce a result, yet. This is what expected due to the B bound must be around 2^55 with success probability 1/27. I've stopped the experiment after the CADO-NFS succeeds. This is self-implemented Pollard's p-1, one can find an optimized in GMP-ECM

  2. Tried the CADO-NFS. The stable version may not be easily compiled for new machines, so prefer the active one from the GitLab.

    After ~6 hours with 4 cores, CADO-NFS produced the result. As expected this is an RSA CTF/Challange. Since I don't want to spoil the fun; here the hash commitments with SHA-512, it is executed with OpenSSL;

    echo -n "prime x" | openssl sha512

27c64b5b944146aa1e40b35bd09307d04afa8d5fa2a93df9c5e13dc19ab032980ad6d564ab23bfe9484f64c4c43a993c09360f62f6d70a5759dfeabf59f18386

faebc6b3645d45f76c1944c6bd0c51f4e0d276ca750b6b5bc82c162e1e9364e01aab42a85245658d0053af526ba718ec006774b7084235d166e93015fac7733d

Details of the experiment

  • CPU : Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

  • RAM : 32GB - doesn't require much ram, at least during polynomial selection and Sieveing.

  • Dedicated cores : 4

  • Test machine Ubuntu 20.04.1 LTS

  • CUDA - NO

  • gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)

  • cmake version 3.16.3

  • external libraries: Nothing out of Ubuntu's canonicals

  • CODA-NFS version : GitLab develepment version cloned at 23-01-2021

  • The bit sizes;

    • n has 326 bits ( RSA-100 challenge has 330 and broken by Lenstra in 1991)
    • p has 165 bits
    • q has 162 bits

The cado-nfs-2.3.0 did not compile and giving errors about HWLOC- HWLOC_TOPOLOGY_FLAG_IO_DEVICES. Asked a friend to test the compile and it worked for them. It was an older Linux version. So I decided to use the GitLab version.

Upvotes: 11

rossum
rossum

Reputation: 15693

What do you know about this number? If it is an RSA public key then it only has two large prime factors. If it is a random number then it will very probably have small prime factors. The type of number will determine how you want to approach factorising it.

Two ancillary functions will also be useful. First a Sieve of Eratosthenes to build a list of primes up to, say 50,000 or some convenient limit. Second a large number prime test, such as Miller-Rabin, to check if the residue is prime or not.

Use the sieve of Eratosthenes to give you all the small primes up to a convenient limit. Test for each prime in turn up to the square root of the target number. When you find a prime that divides the test number, divide the test number to make it smaller. A prime may divide in more than once. Reset the prime limit to the square root of the smaller number once all the divisions are finished.

if (numToTest MOD trialFactor = 0)
  repeat
    listOfFactors.add(trialFactor)
    numToTest <- numToTest/trialFactor
  until (numToTest MOD trialFactor != 0)
  primeLimit <- sqrt(numTotest)
endif

Once the number you are testing has been reduced to 1, you have completely factored it.

If you run out of primes before completely factoring the target, it is worth running the Miller-Rabin test 64 times to see if the remainder is itself prime; potentially that can save you a lot of work trying to find non-existent factors of a large prime. If the remainder is composite then you can either try again with a larger sieve or use one of the heavy duty factoring methods: Quadratic Sieve, Elliptic Curve etc.

Upvotes: 2

Related Questions