numpy matrix algebra best practice

Question

My question is regarding the last line below: mu@sigma@mu. Why does it work? Is a one-dimensional ndarray treated as a row vector or a column vector? Either way, shouldn't it be mu.T@sigma@mu or mu@sigma@mu.T? I know mu.T still returns mu since mu only has one dimension, but still, the interpreter seems to be too smart.

>> import numpy as np
>> mu = np.array([1, 1])
>> print(mu)

[1 1]

>> sigma = np.eye(2) * 3
>> print(sigma)

[[ 3.  0.]
 [ 0.  3.]]

>> mu@sigma@mu

6.0

More generally, which is a better practice for matrix algebra in python: use ndarray and @ to do matrix multiplication as above (cleaner code), or use np.matrix and the overloaded * as below (mathematically less confusing)

>> import numpy as np
>> mu = np.matrix(np.array([1, 1]))
>> print(mu)

[[1 1]]

>> sigma = np.matrix(np.eye(2) * 3)
>> print(sigma)

[[ 3.  0.]
 [ 0.  3.]]

>> a = mu * sigma * mu.T
>> a.item((0, 0))

6.0

hpaulj · Accepted Answer

Python performs chained operations left to right:

In [32]: mu=np.array([1,1])
In [33]: sigma= np.array([[3,0],[0,3]])
In [34]: mu@sigma@mu
Out[34]: 6

is the same as doing two expressions:

In [35]: temp=mu@sigma
In [36]: temp.shape
Out[36]: (2,)
In [37]: temp@mu
Out[37]: 6

In my comments (deleted) I claimed @ was just doing np.dot. That's not quite right. The documentation describes the handling of 1d arrays differently. But the resulting shapes are the same:

In [38]: mu.dot(sigma).dot(mu)
Out[38]: 6
In [39]: mu.dot(sigma).shape
Out[39]: (2,)

For 1d and 2d arrays, np.dot and @ should produce the same result. They differ in handling higher dimensional arrays.

Historically numpy has used arrays, which can be 0d, 1d, and on up. np.dot was the original matrix multiplication method/function.

np.matrix was added, largely as a convenience for wayward MATLAB programmers. It only allows 2d arrays (just as the old, 1990s MATLAB). And it overloads __mat__ (*) with

def __mul__(self, other):
    if isinstance(other, (N.ndarray, list, tuple)) :
        # This promotes 1-D vectors to row vectors
        return N.dot(self, asmatrix(other))
    if isscalar(other) or not hasattr(other, '__rmul__') :
        return N.dot(self, other)
    return NotImplemented

Mu*sigma and Mu@sigma behave the same, though the calling tree is different

In [48]: Mu@sigma@Mu
...
ValueError: shapes (1,2) and (1,2) not aligned: 2 (dim 1) != 1 (dim 0)

Mu*sigma produces a (1,2) matrix, which cannot matrix multiply a (1,2), hence the need for a transpose:

In [49]: Mu@sigma@Mu.T
Out[49]: matrix([[6]])

Note that this is a (1,1) matrix. You have to use item if you want a scalar. (In MATLAB there isn't such a thing as a scalar. Everything has a shape/size.)

@ is a relatively recent addition to Python and numpy. It was added to Python as an unimplemented operator. numpy (and possibly other packages) has implemented it.

It makes chained expressions possible, though I don't have any problems with the chained dot in [38]. It is more useful when handling higher dimensional cases.

This addition means there is one less reason to use the old np.matrix class. (Matrix like behavior is more deeply ingrained in the scipy.sparse matrix classes.)

If you want 'mathematical purity' I'd suggest taking the mathematical physics approach, and use Einstein notation - as implemented in np.einsum.

With arrays this small, the timings reflect the calling structure more than the actually number of calculations:

In [57]: timeit mu.dot(sigma).dot(mu)
2.79 µs ± 7.75 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [58]: timeit mu@sigma@mu
6.29 µs ± 31.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [59]: timeit Mu@sigma@Mu.T
17.1 µs ± 134 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [60]: timeit Mu*sigma*Mu.T
17.7 µs ± 517 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Note that the 'old-fashioned' dot is fastest, while both matrix versions are slower.

On the transpose

I might add that in an expression like mu@sigma@mu.T, the .T is evaluated first, so the @ (matmul) does not know that there was an attempt to transpose mu. It just uses the result of the transpose. Remember it's the Python interpreter that parses the expression, not the numpy functions. So if mu.T does nothing to mu, as is the case with 1d arrays, the .T will have no effect in the @ either.

transpose is well defined for 2d arrays. The numpy transpose has been written in such a way that it also works with arrays of any dimension. It changes the order of the axes, but never adds a dimension. There are other tools for doing that, such as reshape and the np.newaxis indexing.

In Octave, transpose is defined only for 2-d objects

>> ones(2,3,4)'
error: transpose not defined for N-d objects

1-d objects don't exist in MATLAB.

In [270]: np.ones((2,3,4),int).T.shape
Out[270]: (4, 3, 2)

Inner and Outer products

np.dot is quite explicit about how it handles 1d arrays, 'vectors':

If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).

In matmul the description is more convoluted, but the result is the same.

Regarding your comment:

A = [1, 2] and B = [3, 5] being (2, ) ndarray, A@B could mean [1, 2] * [3, 5]' = 13, or it could mean [1, 2]' * [3, 5] = [[3, 5], [6, 10]]

The various ways of taking a product in numpy are:

In [273]: A = np.array([1,2]); B = np.array([3,5])

Element multiplication (.* in MATLAB)

In [274]: A*B
Out[274]: array([ 3, 10])

Inner product

In [275]: A@B       # same as np.dot(A,B)
Out[275]: 13

Outer product

In [276]: np.outer(A,B)
Out[276]: 
array([[ 3,  5],
       [ 6, 10]])

Another way to get the inner product:

In [278]: np.sum(A*B)
Out[278]: 13

With Einstein notation (from math physics), both inner and outer:

In [280]: np.einsum('i,i',A,B)
Out[280]: array(13)
In [281]: np.einsum('i,j',A,B)
Out[281]: 
array([[ 3,  5],
       [ 6, 10]])

Using broadcasting to get the outer

In [282]: A[:,None]*B[None,:]
Out[282]: 
array([[ 3,  5],
       [ 6, 10]])

What you intend by [1, 2]' is implemented in numpy as A[:,None], turning the 1d array into a column vector (2d).

Octave added broadcasting long after numpy had it. I don't know if MATLAB has advanced that far or not. :)

@ doesn't do broadcasting, but can work with the column or row vectors:

In [283]: A@B[:,None]       # your imagined A*B'
Out[283]: array([13])

To get the outer with @ I need to add a dimension to make A a column vector, and B a row vector:

In [284]: A[:,None]@B
ValueError: shapes (2,1) and (2,) not aligned: 1 (dim 1) != 2 (dim 0)
In [285]: A[:,None]@B[None,:]
Out[285]: 
array([[ 3,  5],
       [ 6, 10]])

When the arrays really are 2d, and that includes the case where one dimension is 1, the @/dot behavior isn't that different from common matrix conventions. It's when dimension is 1, or greater than 2, that intuitions from MATLAB fail, in part because MATLAB doesn't handle those.

This Octave error suggests that n-d matrices just piggy back on 2d ones. They aren't real n-d in the numpy sense.

>> ones(2,3,4) * ones(2,3,4)
error: operator *: nonconformant arguments (op1 is 2x12, op2 is 2x12)

numpy matrix algebra best practice

Answers (2)

On the transpose

Inner and Outer products

Related Questions