Reputation: 21
First, I'm very noob :) I didn't studied deep on elliptic curves. I just google some elementary knowledge of prime factorization and elliptic curve.
I'm trying to make the python3 code implementation of elliptic curve factoring algorithm. I just Followed the discription in Lenstra's Elliptic Curve Method, and with some functinons, class and implemented error, I managed to build the code:
from random import randint
from math import e as exp
from math import sqrt, log
class InvError(Exception):
def __init__(self, v):
self.value = v
def inv(a,n):
r1, s1, t1 = 1, 0, a
r2, s2, t2 = 0, 1, n
while t2:
q = t1//t2
r1, r2 = r2, r1-q*r2
s1, s2 = s2, s1-q*s2
t1, t2 = t2, t1-q*t2
if t1!=1: raise InvError(t1)
else: return r1
class ECpoint(object):
def __init__(self, A,B,N, x,y):
if (y*y - x*x*x - A*x - B)%N != 0: raise ValueError
self.A, self.B = A, B
self.N = N
self.x, self.y = x, y
def __add__(self, other):
A,B,N = self.A, self.B, self.N
Px, Py, Qx, Qy = self.x, self.y, other.x, other.y
if Px == Qx and Py == Qy:
s = (3*Px*Px + A)%N * inv((2*Py)%N, N) %N
else:
s = (Py-Qy)%N * inv((Px-Qx)%N, N) %N
x = (s*s - Px - Qx) %N
y = (s*(Px - x) - Py) %N
return ECpoint(A,B,N, x,y)
def __rmul__(self, other):
r = self; other -= 1
while True:
if other & 1:
r = r + self
if other==1: return r
other >>= 1
self = self+self
def ECM(n):
x0 = 2
y0 = 3
bound = max(int(exp**(1/2*sqrt(log(n)*log(log(n))))),100)
while True:
try:
a = randint(1,n-1)
inv(a,n)
b = (y0*y0 - x0*x0*x0 - a*x0) %n
inv(b,n)
inv((4*a*a*a + 27*b*b)%n, n)
P = ECpoint(a,b,n, x0,y0)
for k in range(2, bound):
inv(P.x, n)
inv(P.y, n)
P = k*P
#print(k,P)
except InvError as e:
d = e.value
if d==n: continue
else: return d, n//d
print(ECM(int(input())))
This code gets a composite number as input, and print two nontrivial divisors.
The code runs well for most of the inputs. Problem is, it's too slow... I tested this code for some numbers between 60~120 bit integers(such as 2^101-1, 10^30-33, etc.), and what I encountered was it is even slower than roughly made Pollard's p-1 test like this:
def Pollard_pminus1(n):
if '0' not in bin(n)[3:]: base = 3
else: base = 2
if n % base == 0: return base, n//base
b = base; exp = 1
while True:
b = pow(b,exp,n)
d = gcd(b-1,n)
if d!=1: break
exp += 1
if d!=n: return d, n//d
And for inputs of roughly 50 digits(according to Wikipedia, this range should be the real playground for ECM...), this program is halted even for a day.
Can I improve the performance of this code? Is it worthy to optimize the boundary of the value k, or should I fix large part of this algorithm?
Thanks for your help, and sorry for poor language(if something wrong);;
Upvotes: 2
Views: 1727
Reputation: 16737
Thanks for very interesting question!
I did some optimizations to your code, which improved its speed around 95x times, after them factorization of 166 bit number (50 decimal digits, as you wished) takes from 10 minutes (see 11-minutes log) to 4 hours on my laptop (time varies depending on luck) and 132 bit (40 decimal digits) takes just 1 to 15 minutes. See final code at the very end of this post.
Following improvements were made:
Used multiprocessing module to occupy all CPU cores, instead of single-core version of your code. This change made your code 8 times faster on my 4-cores laptop (with 8 hardware threads).
Used GMPY2 library to replace pure Python int type with gmpy2.mpz() type, which does all mathematics 2-3 times faster than Python due to highly optimized GMP library. Use of gmpy2
module can be disabled (if you wish) by commenting line import gmpy2
.
Replaced body of your pure Python inv()
function with fast GMPY2-based gmpy2.gcdext() implementation of Extended Euclidean Algorithm. If gmpy2
use is disabled in my code (which is possible) then built-in Python's pow(a, -1, n) is used for inverting integer. gmpy2.gcdext()
is faster than Python's pow(a, -1, n)
and much faster than your inv()
implementation of Extended Euclidean Algorithm. In Python's C source inverting a number through pow(a, -1, n)
is done with similarly to yours Extended Euclidean Algorithm, source of which you can see here, this version is faster than yours due to only reason that it is totally C-based, of cause GMPY2's variant gmpy2.gcdext()
is even more faster because it uses more mathemacially profound algorithms.
Replaced multiplication of curve point by range(2, bound)
with multiplication just by all prime numbers within range [2 ; bound]
. This is a significant improvement, because prime numbers are much more sparse than range of all numbers, hence much less multiplications. To make this change possible I had to introduce bound_pow
constant such that prime ** m <= bound_pow
, in other words I multiply not by prime itself but by its power within some bound. Power of prime is needed in a case when order of a curve has higher than 1 powers of some prime factors.
Removed unnecessary inv(P.x, n)
and inv(P.y, n)
which occupy significant time but don't do any useful work, because these two inversions have a not bigger chance of producing a factor than inverting any random number, so not useful at all.
Removed unnecessary check of elliptic curve invariant (y ** 2 - x ** 3 - A * x - B) % N == 0
within body of ECpoint's constructor, because this check is done on every __add__()
call and will fail only if point addition is implemented incorrectly, hence it is just a debug check which slows down things.
Did quite a lot of visualizations, printing to console of different values, timings and progress-meter.
Implemented functions for computation of many useful helper values, like optimal value of bound
instead of your approximate formula (which I didn't use but just printed). Also computed optimal number of needed curves. Also function for calculating maximal factor size (with probability of miss 10%
) for the given bound and number of curves. Function for computing amount of work (Work2()
) needed for given bound and number of curves.
Steps 1)-6) above are all doing speed improvements (steps 5)-6) are not very significant, only 1)-4) are important), you can reproduce them manually in your code, they are quite simple.
Example output for 112-bit composite (factored within 148 seconds):
Factoring 2524582745259710504267101693459139
Calculating. Wait...
n 2^110.96, bound 2^13.18 (optimal 2^14.24, 1.051x faster),
need at most 974 curves (factor up to 2^55.48)
ooo
curves 1/974 ( 0.103%), factors < 2^14.508, time 00:00:20
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
curves 86/974 ( 8.830%), factors < 2^41.646, time 00:01:20
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
curves 179/974 ( 18.378%), factors < 2^45.995, time 00:02:21
oooooo
Factored on curve 190 (19.61%) (rand_seed = 248080385138786)!
ooo
Factors: P 38240987134592087, P 66017719060814197
2 prime, 0 composite
All factors are prime!!!
Time 148.2 sec
Example for 166 bit number (50 digit), with random.seed(1)
, takes 11.5 minutes to factor:
Factoring 45733304697523851846830687775886905451041736136239
Calculating. Wait...
n 2^164.97, bound 2^16.79 (optimal 2^17.43, 1.023x faster), need at most 5722 curves (factor up to 2^82.48)
........................................o
curves 1/5722 ( 0.017%), factors < 2^18.480, time 00:00:25
ooooooo........................................oooooooo........................................oooooooo........................................o
curves 25/5722 ( 0.437%), factors < 2^43.197, time 00:01:25
ooooooo........................................oooooooo........................................oooooooo......................................o
curves 49/5722 ( 0.856%), factors < 2^48.646, time 00:02:26
..ooooooo....................................o....ooooooo....................................o....ooooooo....................................o
curves 73/5722 ( 1.276%), factors < 2^51.779, time 00:03:27
...o.oooooo..................................o.....o.oooooo..................................o....o.o.ooooo..................................o
curves 97/5722 ( 1.695%), factors < 2^53.972, time 00:04:28
....o.oo.oooo..................................o....o.oooo.oo..................................o....o.ooo.ooo..................................o
curves 121/5722 ( 2.115%), factors < 2^55.655, time 00:05:29
.....oooo.ooo..................................o....o.ooo.oo.o.................................o....o.ooo.oo.o.................................o
curves 145/5722 ( 2.534%), factors < 2^57.017, time 00:06:29
....o.oooo.o.o.................................o....o.ooo.oo.o.................................o...o..oooo.o.o.................................o
curves 169/5722 ( 2.954%), factors < 2^58.161, time 00:07:30
o.....oooo.o.o.................................oo.....oooo.o.o.................................oo.....oo.ooo.o.................................o
curves 193/5722 ( 3.373%), factors < 2^59.145, time 00:08:31
o.....oo.ooo.o.................................oo.....oo.ooo.o.................................oo.....oooo..oo................................o
curves 217/5722 ( 3.792%), factors < 2^60.009, time 00:09:31
.o.....oooo..oo.................................oo.....oooo..o.o................................oo.....oooo..o.o...............................o
curves 241/5722 ( 4.212%), factors < 2^60.778, time 00:10:32
.o.....oooo..o.o...............................o.o.....oooo..o.o...............................o.o.....oooo..o.o...............................o
curves 265/5722 ( 4.631%), factors < 2^61.470, time 00:11:33
.o.....oooo..o.o...............................o.o....o.ooo..o.o............
Factored on curve 282 (4.95%) (rand_seed = 83739500771956)!
Factors: P 4856082635476451488745861, P 9417736091106851081336099
2 prime, 0 composite
All factors are prime!!!
Time 724.9 sec
For demo purposes in my test()
function I compose a composite number as a product of two prime numbers equal in bit-length (same as in RSA). Bit length of composite number is given by bits
constant.
All randomness in my code is deterministic due to usage of random.seed(0)
in the beginning of script. This seed is set fixed so that you can reproduce exactly same deterministic results within several runs of script. You may change this seed value if you wish to have other randomness and hence other generated composite and curves. As you can see in example console output above, at the end I show rand_seed
value for the successful curve that factored a number, this seed value is useful if you want to reproduce once again exact params of a curve for fast redoing of factoring, or to give this successful curve to your friends.
Number of curves needed is computed such that you have exactly just 10%
of chance to miss a factor if a factor has bit length equal to half of composite's bit length. In other words in worst case, when a factor is largest possible, then if you run all these number of curves then you have exactly 10%
chance of missing such factor (90%
chance of successful factorization). If you run twice more curves then you'll get just 1%
chance of miss (99%
chance of success). If you want to get 50%
chance of factoring then you have to check just 1/3
-th of computed amount of curves. Many times I got successful factorization just after checking 1/10
-th of needed amount of curves, in this case chance of hitting a factor is 10%
(90%
miss chance).
Usage of GMPY2 library gives around 1.8x
times overall speedup compared to pure Python integers, so very significant, almost two times faster. If you want pure Python solution then just comment out import gmpy2
in first lines of script. On Linux you can install gmpy2 through sudo apt install python3-gmpy2
. On Windows download wheels file from here for corresponding Python version and install it through e.g. python -m pip install gmpy2‑2.0.8‑cp39‑cp39‑win_amd64.whl
.
Further 20%
speedup can be achieved if you compute Non-Adjacent Form, wNAF, this method reduces greatly amount of needed additions (from 50%
for random distribution of 1-bits to 10%
), amount of doublings (adding same point to same point) remain the same. I did this method in my ECM
library (other script) and it gave me 20%
overall speedup. Also to use this method you have to precompute product (multiply) around 200
prime numbers in a row, unlike current method where you can multiply just one by one prime number.
You can see that I precomputed and printed to console optimal
bound, computed using OptimalBoundLog()
function, using this bound should give around 3-5%
more speedup, although I didn't use this bound in my code below.
Note. I also implemented my own elliptic curve factorization script in Python, this script is considerably longer and it has point multiplication implemented in pure C++ (with Cython wrapper). I did timings of Python-only (GMPY2-based) multiplication and C++ version, they differ 3x
times. So C++ gives extra 3x
times boost.
If to approximately calculate total speedup, then we'll have: 8x
due to multiprocessing (I have 8 cores at home), 1.8x
due to use of GMPY2, 2x
due to use of gmpy2.gcdext()
instead of Pythonic version of Extended Euclidean Algorithm, around 3x
due to use of prime numbers instead of consecutive numbers, 1.1x
due to removing all unnecessary code like inv(P.x, n)
, 3x
due to use of C++ for point multiplication (which I used only in my library, not posted here), also 1.2x
boost can be gained if to implement Non-Adjacent Form (wNAF). Total of 340x times speedup!
Also you should take into account that actual running time depends greatly on your luck. Sometimes you should check just 3%
(if you're very luck) of all computed amount of needed curves before you find a factor, sometimes you have to check 200%
of them (if you're very unlucky). Checking 100%
of them will give a probability of 90%
to find a factor (or 10%
probability to miss a factor). Of course time also depends on bit length of a smallest factor. This amount of needed curves was computed for the worst case, when factor has bit length equal to half of bit length of input composite number n
.
You can see at the very end of successful factoring that I show something similar to Factored on curve 190 (19.61%) (rand_seed = 248080385138786)!
, here rand_seed
is very significant, it uniquely identifies last curve that successfully factored the number. If you have long running factorization then at the end you should remember this seed value so that you can immediately reproduce results of factoring. To reproduce results just input same composite number to your ECM()
function, and replace 'rand_seed': random.randrange(1 << 48),
in code with 'rand_seed': 248080385138786,
(this seed value is taken from example output above). And rerun the script, then you should see successful factoring result at the very beginning at 0-th curve. Very important - do this modification of rand_seed
in the script only temporarily, revert it back to 'rand_seed': random.randrange(1 << 48),
after you're done, otherwise you'll destroy all source of randomness for other input n
numbers, which will result in infinite unsuccsessful factorization runs. You can also make this rand_seed
as an optional argument to ECM()
function so that you don't need to modify code inplace.
To play with following code try to modify two values - bit length of input composite number, it is set through bits
variable at the end of script inside test()
function (right now it is 112
bits). Second - try modifying line random.seed(0)
at the very beginning of a script, change seed value to other values like 1
, 2
, 3
etc. If you don't change this seed then you'll get exactly same results of running a script every time. This seed controls behaviour of all random values inside script. Same seed is needed if you want to reproduce exactly same results of script runs for some reason, like showing results to your friends.
Improved code itself:
gmpy2 = None
import gmpy2
import random
random.seed(0)
class InvError(Exception):
def __init__(self, v):
self.value = v
def Int(x):
return int(x) if gmpy2 is None else gmpy2.mpz(x)
def inv(a, n):
a %= n
if gmpy2 is None:
try:
return pow(a, -1, n)
except ValueError:
import math
raise InvError(math.gcd(a, n))
else:
g, s, t = gmpy2.gcdext(a, n)
if g != 1:
raise InvError(g)
return s % n
# Slow version below
'''
r1, s1, t1 = 1, 0, a
r2, s2, t2 = 0, 1, n
while t2:
q = t1//t2
r1, r2 = r2, r1-q*r2
s1, s2 = s2, s1-q*s2
t1, t2 = t2, t1-q*t2
if t1!=1: raise InvError(t1)
else: return r1
'''
class ECpoint(object):
def __init__(self, A, B, N, x, y, *, prepare = True):
if prepare:
N = Int(N)
A, B, x, y = [Int(e) % N for e in [A, B, x, y]]
if (y ** 2 - x ** 3 - A * x - B) % N != 0:
raise ValueError
self.A, self.B, self.N, self.x, self.y = A, B, N, x, y
def __add__(self, other):
A, B, N = self.A, self.B, self.N
Px, Py, Qx, Qy = self.x, self.y, other.x, other.y
if Px == Qx and Py == Qy:
s = ((Px * Px * 3 + A) * inv(Py * 2, N)) % N
else:
s = ((Py - Qy) * inv(Px - Qx, N)) % N
x = (s * s - Px - Qx) % N
y = (s * (Px - x) - Py) % N
return ECpoint(A, B, N, x, y, prepare = False)
def __rmul__(self, other):
other = Int(other - 1)
r = self
while True:
if other & 1:
r = r + self
if other == 1:
return r
other >>= 1
self = self + self
def BinarySearch(f, a, b):
while a < b:
m = (a + b) // 2
if f(m):
b = m
else:
a = m + 1
assert a == b and f(a), (a, b, f(a))
return a
def FactorBitSize(bound, ncurves, miss_prob = 0.1):
import math
bound_log2 = math.log2(bound)
def Prob(factor_log2):
x = factor_log2 / bound_log2
return x ** -x
def F(factor_log2):
return (1. - Prob(factor_log2)) ** ncurves >= miss_prob
return BinarySearch(lambda x: F(x / 1000.), math.log2(bound) * 1000., 512 * 1000.) / 1000.
def NeededCurves(bound, target_fac_log2, miss_prob = 0.1):
def F(ncurves):
return FactorBitSize(bound, ncurves, miss_prob) >= target_fac_log2
return round(BinarySearch(lambda x: F(x), 1, 10 ** 15))
def Work(bound, bound_pow, *, logs = [0.], cache = {}):
if bound not in cache:
import math
for ilog in range(100):
if get_prime(1 << ilog) >= bound:
break
cnt_primes = BinarySearch(lambda i: get_prime(i) >= bound, 0, 1 << ilog)
bound_pow = max(bound, bound_pow)
bound_log = math.log2(bound)
bound_pow_log = math.log2(bound_pow)
while cnt_primes >= len(logs):
plog = math.log2(get_prime(len(logs)))
sum_plog = plog
while True:
sum_plog += plog
if sum_plog >= max(bound_log, bound_pow_log):
sum_plog -= plog
break
logs.append(logs[-1] + sum_plog)
cache[bound] = logs[cnt_primes]
return cache[bound]
def Work2(bound, bound_pow, factor_log):
return Work(bound, bound_pow) * NeededCurves(bound, factor_log)
def OptimalBoundLog(factor_log, bound_pow):
import math
mwork = None
bound_log_start = 10
for bound_log in range(bound_log_start, math.floor(factor_log) + 1):
bound = 2 ** bound_log
work = Work(bound, bound_pow) * NeededCurves(bound, factor_log)
if mwork is not None and work > mwork[0]:
break
if mwork is None or work < mwork[0]:
mwork = (work, bound_log)
else:
bound_log = bound_log_start + 1
mult = 200
mwork = None
for bound_log2 in range((bound_log - 3) * mult, bound_log * mult + 1):
bound = round(2 ** (bound_log2 / mult))
work = Work(bound, bound_pow) * NeededCurves(bound, factor_log)
if mwork is None or work < mwork[0]:
mwork = (work, bound_log2 / mult)
return mwork[1]
def get_prime(i, *, primes = [2, 3]):
while i >= len(primes):
for n in range(primes[-1] + 2, 1 << 62, 2):
isp = True
for p in primes:
if p * p > n:
break
if n % p == 0:
isp = False
break
if isp:
primes.append(n)
break
return primes[i]
def prime_power(idx, bound, bound_pow, *, cache = {}):
key = (bound, bound_pow)
if key not in cache:
bound_pow = max(bound, bound_pow)
r = []
for i in range(1 << 62):
p = get_prime(i)
if p >= bound:
break
m = p
while True:
m2 = m * p
if m2 >= bound_pow:
break
m = m2
r.append(m)
cache[key] = r
return cache[key][idx] if idx < len(cache[key]) else None
def ExcInfo(ex):
return f'{type(ex).__name__}: {ex}'
def ProcessCurve(*, n, bound, bound_pow, shared, rand_seed, curve_idx, num_curves):
try:
random.seed(rand_seed)
x0, y0, a = [random.randrange(1, n) for i in range(3)]
# x0 = 2; y0 = 3
b = (y0 ** 2 - x0 ** 3 - a * x0) % n
P = ECpoint(a,b,n, x0,y0)
for i in range(1 << 62):
if shared.get('finish', False):
return {'ex': 'Interrupted: Finishing...'}
k = prime_power(i, bound, bound_pow)
if i > 0 and i % 2000 == 0 or k is None:
Print(('.', 'o')[k is None], end = '', flush = True)
if k is None:
break
P = k * P
except InvError as e:
d = e.value
if d != n:
Print(f'\nFactored on curve {curve_idx} ({(curve_idx + 1) / num_curves * 100.:.02f}%) (rand_seed = {rand_seed})!')
return {'factors': sorted([d, n // d])}
except BaseException as ex:
shared['finish'] = True
return {'ex': ExcInfo(ex)}
return {}
def Print(*pargs, **nargs):
print(*pargs, **{'flush': True, **nargs})
def ECM(n, *, processes = None):
import math, multiprocessing as mp, time
from datetime import datetime as dt
Print(f'Factoring {n}')
if fermat_prp(n):
return [n]
Print('Calculating. Wait...')
start_time = dt.now()
bound = max(int(math.e**(1/2*math.sqrt(math.log(n)*math.log(math.log(n))))),100)
bound_pow = max(bound, 1 << 18)
factor_log = math.log2(n) / 2
max_curves = NeededCurves(bound, factor_log)
opt_bound_log = OptimalBoundLog(factor_log, bound_pow)
processes = processes or mp.cpu_count()
Print(f'n 2^{math.log2(n):.02f}, bound 2^{math.log2(bound):.02f} (optimal 2^{opt_bound_log:.02f}, ' +
f'{Work2(bound, bound_pow, factor_log) / Work2(2 ** opt_bound_log, bound_pow, factor_log):.03f}x faster), ' +
f'need at most {max_curves} curves (factor up to 2^{factor_log:.02f})')
with mp.Manager() as manager, mp.Pool(processes) as pool:
try:
ncurves, report_time = 0, 0
shared = manager.dict()
res = []
for icurve in range(1 << 62):
res.append(pool.apply_async(ProcessCurve, (),
{
'n': n, 'bound': bound, 'bound_pow': bound_pow,
'shared': shared, 'rand_seed': random.randrange(1 << 48),
'curve_idx': icurve, 'num_curves': max_curves,
}))
if len(res) < processes * 9:
continue
while len(res) >= processes * 6:
res2 = []
for e in res:
if not e.ready():
res2.append(e)
continue
e = e.get()
assert 'ex' not in e, e['ex']
if 'factors' in e:
return e['factors']
ncurves += 1
if time.time() - report_time >= 60:
Print(f'\ncurves {ncurves:>5}/{max_curves} ({ncurves / max_curves * 100.:>7.03f}%), factors ' +
f'< 2^{FactorBitSize(bound, ncurves):.03f}, ' +
f'time {(dt(2000, 1, 1) + (dt.now() - start_time)).strftime("%H:%M:%S"):>9}')
report_time = time.time()
res = res2
time.sleep(0.01)
except BaseException as ex:
Print(f'\nException: {ExcInfo(ex)}. Finishing, wait!')
finally:
shared['finish'] = True
pool.close()
pool.join()
Print()
return [n]
def fermat_prp(n, trials = 32):
# https://en.wikipedia.org/wiki/Fermat_primality_test
if n <= 16:
return n in (2, 3, 5, 7, 11, 13)
for i in range(trials):
if pow(random.randint(2, n - 2), n - 1, n) != 1:
return False
return True
def gen_random_prime(bits):
while True:
n = random.randrange(1 << (bits - 1), 1 << bits)
if fermat_prp(n):
return n
def Prod(it):
import functools
return functools.reduce(lambda x, y: x * y, it, 1)
def test():
import time
# First 190 digits of Pi
pi = 1415926535_8979323846_2643383279_5028841971_6939937510_5820974944_5923078164_0628620899_8628034825_3421170679_8214808651_3282306647_0938446095_5058223172_5359408128_4811174502_8410270193_8521105559_6446229489
bits = 112
nprimes = 2
n = Prod([gen_random_prime(bits // nprimes) for i in range(nprimes)])
tb = time.time()
fs = ECM(n)
Print('Factors:', ', '.join([('C', 'P')[fermat_prp(e)] + f' {e}' for e in fs]))
assert n == Prod(fs), (n, fs)
Print(sum(fermat_prp(e) for e in fs), 'prime,', sum(not fermat_prp(e) for e in fs), 'composite')
Print('All factors are prime!!!' if all(fermat_prp(e) for e in fs) else 'Composite factors remaining...')
Print(f'Time {time.time() - tb:.01f} sec')
if __name__ == '__main__':
test()
Upvotes: 4