Reputation: 21253
I can make a polynomial and square it with:
from numpy.polynomial import Polynomial
P=Polynomial([1,2,3])
P**2
I will also need to be able to access individual coefficients and be able to truncate polynomials. These are both supported by numpy.
However, in my case the coefficients are going to be very large and will need a lot of precision as well. Decimal and mpmath both support arbitrary precision but they don't support polynomial multiplication as far as I can tell.
Is there any support for polynomial multiplication in Python without having to implement it myself, perhaps using one of these modules?
In my case I need it as fast as possible. AFAICT mpmath doesn't support polynomial multiplication.
Upvotes: 1
Views: 356
Reputation: 27190
Use Arb, accessible in Python for example via python-flint.
>>> import flint
>>> flint.ctx.dps = 100
>>> A = flint.arb_poly(list(range(10000000)))
>>> B = A**2
>>> B[10000000]
166666666666665000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000
Let's say you want to square a length-1000 polynomial with mpmath coefficients at 1000-digit precision. Then you can do something like this:
>>> import mpmath
>>> import flint
>>>
>>> mpmath.mp.dps = 1000
>>> flint.ctx.dps = 1000
>>>
>>> poly = [mpmath.sqrt(i) for i in range(1000)]
>>> A = flint.arb_poly(poly)
>>>
>>> B = A ** 2
>>>
>>> mpmath.mpf(B[1000])
mpf('392685.9346252675985159413676771703266071184826459632081633190560895316298035241138430247489627160899748676297275241408801524133378178815477143404171012554652425237085115594106094896102500629296880905425022936881947985831789453090942867266616153032668865995178945425417030910072070101276176899084962913639871126284083779960031998269231131603585931017925505407161173080304089529394450040256817273448223780684647793102484840269111446410776461250440183492417255982239769656457877884170620711608117523186247568842825785753875665030904736364231937498735050432226951575289364585259272644271306207565517685121910365380901053479027727533242166603285535370458037253490677830242054140638164581578659233650071122733992948028135839726984756007834869899357269398026265706051227301385552289563833997499763226603902090404633373836370051178765133702099779815308572973213210868003740849296264312288288544994494805013564201555822870780045939348897447083439565619705884701779691680615979177739099576734946338023039699974169')
Here the polynomial multiplication takes 0.01 seconds on my machine.
Upvotes: 1
Reputation: 16747
Very interesting task you have! Thanks!
Just decided to implement from scratch solution in pure Python + Cython + C++.
My answer is more educational/scientific, better not to use it in pure production, unless it is well tested. Although I created quite good tests for it already.
I achieved 25x times speedup of polynomial multiplication inside my algo compared to school grade algo, this 25x speedup is for the case of multiplying two random polynomials with 1000 coefficients each 2000 bits in size. Bigger polynomials will give a much bigger speedup.
For simplicity of implementation I did only sub-task, which is only polynomials with positive large integers as coefficients. It can be generalized to big floating point coefficients too, but I don't have full idea right now on how to do that.
I used so called Discrete Fourier Transform (DFT), actually its integer derivative Number Theoretic Transform (NTT) which deals with prime modulus (ring).
Fourier Transform and NTT are well described on this page. While writing my code I was mostly inspired by this description. One more interesting resource about NTT is here.
As it is known Fourier/NTT has complexity O(N * Log(N))
when muliplying two polynomials approximately having N bits in total. While school grade polynomial multiplication needs O(N^2)
operations. So it means that Fourier is N / Log(N)
times faster, which is really really large gain especially for very large polynomials.
Why I chose NTT instead of DFT? Only because DFT has precision error, meaning that with growth of size of polynomials error rises up, finally giving incorrect answer. While NTT has no error at all. NTT is several times slower than DFT but benefitial as it gives no errors at all.
Besides implementing ideas from wiki pages above, I also did really a lot of great optimization on C++ level, even had to use some Intel Intrinsics-function to get extra speed.
My code needs some tweaking process before it is run. Only because it compiles everything from sources, following steps for simplicity are done for Windows + MSVC only, yet my code can be easily adopted to Linux + GCC/CLang (or Windows CLang):
Install Python. I tested on 3.9 version.
Install Visual Studio 2022 Community.
Install VCPKG.
Inside VCPKG install Boost and GMP libraries, by running vcpkg install boost gmp
inside vcpkg directory cloned in step 3.
. You can read docs about GMP and Boost, I used them only for 128-bit arithmetics and also for temporary storing coefficients of polynomials.
Inside following Python source code modify all libraries source paths, search for string bin2/Python39/include/
and dev/_3party/vcpkg/installed/
in my code below, this is where you need to modify to your installed directories.
Install some PIP Python packages too, through python -m pip install cython setuptools numpy
.
Now just run Python code. It automatically compiles everything and then runs timing benchmarks.
My code does some timing benchmarks, by creating two very large polynomials, each having 1000 coefficients (degree) with 2000 bit random integers as coefficients. After that I run multiplication both on school grade algo and on my advanced NTT technique. Then I also compared correctness of results by comparing for equality regular school grade answer and my fast NTT answer.
My Python code contains C++ code inside it pasted as huge string-text. You can copy paste this C++ code only and run it separately if you'd like. It is fully self-contained and has main() function so it can be easily compiled as separate C++ program, either on MSVC or CLang. See below second Github Gist link with this C++ code only.
CODE GOES HERE. My code is so large that it doesn't fit into StackOverflow post limit of 30 000 symbols, code is about 46 KB in size. So I hosted it separately in Gist link below:
(extra I provide C++ Version Only)
Example Console Output of Python:
Cython module: cy003EB71B1A92
Regular 19.614
NTT 0.781
OK
Boost 25.114x
Tests finished!
Example Console Output of C++-Only version:
Test FindNttMod
FindNttEntry<T>{.k = 57, .c = 29, .p = 4179340454199820289, .g = 3, .root = 68630377364883, .plog2 = 61.86},
FindNttEntry<T>{.k = 54, .c = 177, .p = 3188548536178311169, .g = 7, .root = 3055434446054240334, .plog2 = 61.47},
FindNttEntry<T>{.k = 54, .c = 163, .p = 2936346957045563393, .g = 3, .root = 83050791888939419, .plog2 = 61.35},
FindNttEntry<T>{.k = 55, .c = 69, .p = 2485986994308513793, .g = 5, .root = 1700750308946223057, .plog2 = 61.11},
FindNttEntry<T>{.k = 54, .c = 127, .p = 2287828610704211969, .g = 3, .root = 878887558841786394, .plog2 = 60.99},
FindNttEntry<T>{.k = 55, .c = 57, .p = 2053641430080946177, .g = 7, .root = 640559856471874596, .plog2 = 60.83},
FindNttEntry<T>{.k = 56, .c = 27, .p = 1945555039024054273, .g = 5, .root = 1613915479851665306, .plog2 = 60.75},
FindNttEntry<T>{.k = 53, .c = 161, .p = 1450159080013299713, .g = 3, .root = 359678689516082930, .plog2 = 60.33},
FindNttEntry<T>{.k = 53, .c = 143, .p = 1288029493427961857, .g = 3, .root = 531113314168589713, .plog2 = 60.16},
FindNttEntry<T>{.k = 55, .c = 35, .p = 1261007895663738881, .g = 6, .root = 397650301651152680, .plog2 = 60.13},
0.027 sec
Test MultBigInt 1.132 sec
Test CompareNttMultWithReg
Time NTT 0.044 Reg 0.646
Swap 0.432 (Slow 0.000) ToMontg 0.040 Main 1.191 (0.170, 1.021) Invert 0.000
Swap 0.467 (Slow 0.000) ToMontg 0.100 Main 1.142 (0.162, 0.981) Invert 0.000
MidMul 0.065
Swap 0.297 (Slow 0.000) ToMontg 0.000 Main 1.137 (0.170, 0.966) Invert 0.047
Time NTT 5.313 Reg 0.000
6.383 sec
Upvotes: 1