DeltaIV
DeltaIV

Reputation: 5646

How to speed up the construction of a matrix without using a double for loop

I want to the accelerate following double for loop:

n = 10000
A = np.zeros((n, n))
twopi = 2.0 * np.pi
for i in range(0, n):
    for j in range(0, n):
        A[i,j] = twopi * i*j/n

How could I do it? My idea: compute a vector v containing all the elements from 0 to n-1:

v = np.arange(n) 

then construct two matrices I and J, one having all rows equal to v and the other one having all columns equal to v. If I'm not mistaken, then I could substitute the double for loop with

A = twopi * I*J/n

Is this correct? How do I build I and J? Are there better ways?

Upvotes: 0

Views: 94

Answers (4)

inspectorG4dget
inspectorG4dget

Reputation: 114015

In [118]: matrix = np.array([np.arange(n) for _ in range(n)])

In [119]: matrix
Out[119]:
array([[   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       ...,
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999],
       [   0,    1,    2, ..., 9997, 9998, 9999]])

In [120]: matrix [0,1]
Out[120]: 1

In [121]: for j in range(n): matrix[:,j] *= j

In [122]: matrix
Out[122]:
array([[       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       ...,
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001],
       [       0,        1,        4, ..., 99940009, 99960004, 99980001]])

In [123]: matrix = matrix/n

In [124]: matrix = matrix * np.pi*2

In [125]: matrix
Out[125]:
array([[0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       ...,
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04],
       [0.00000000e+00, 6.28318531e-04, 2.51327412e-03, ...,
        6.27941596e+04, 6.28067228e+04, 6.28192873e+04]])

Same thing as a script:

matrix = np.array([np.arange(n) for _ in range(n)])
for j in range(n): matrix[:,j] *= j
matrix = matrix/n
matrix = matrix * np.pi*2

Upvotes: 1

Whole Brain
Whole Brain

Reputation: 2167

An answer

You can indeed use numpy's broadcasting ability.

import numpy as np
n = 10000
twopi = 2.0 * np.pi / n
A = np.arange(n) * np.arange(n).reshape(-1, 1) * twopi

The reshape of the second np.arange() drives the product to broadcast.

It's very similar to calling np.outer() as Dr.V did, even from a time performance perspective.

A benchmark

Config:

import numpy as np
n = 10000
twopi = 2 * np.pi / n

Times:

# List comprehension
A = np.array([   
    [twopi*i*j for j in range(n)] 
for i in range(n)])
# timeit > 10.2 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# Outer
i = np.arange(n)
A = twopi * np.outer(i, i)
# timeit > 234 ms ± 6.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Product
A = np.arange(n) * np.arange(n).reshape(-1, 1) * twopi
# timeit > 208 ms ± 41.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As said, the outer and product methods are very similar. But the list comprehension is not because it defeats the purpose of numpy which is to vectorize the array computations with distributed sub-tasks (and, very often, low-level optimizations).

Upvotes: 1

Dr. V
Dr. V

Reputation: 1914

Something like this:

from numpy import outer, pi, arange

n = 10
i = arange(n)
A = 2 * pi / n * outer(i, i)

The outer product does what you ask for.

Edit: A simple way to test performance is the following:

from time import time

n = 10000
t = time()
for _ in range(10):
    i = arange(n)
    A = 2 * pi / n * outer(i, i)

t = time() - t
print("Time used [s]", t)

For me, it uses 5 seconds for these 10 repetitions, and there is no significant difference between range and arange.

Upvotes: 3

Acccumulation
Acccumulation

Reputation: 3591

You can use list comprehensions:

A = np.array([   
        [twopi*i*j/n for j in range(n)] 
    for i in range(n)])

The default is for range to start at zero, so range(0,n) rather than range(n)is redundant.

Upvotes: 1

Related Questions