Reputation: 5418

numpy sum of squares for matrix

I do have a matrix with observations in rows (measurements at differnt pH) with data points as columns (concentration over time). So one row consists of differnt data points for one pH.

I do want to fit an ODE to the data. So i defined a cost function and would like to calculate the sum of squares for all observatoins. Taking the sum of sqares for this matrix should work like:

res = y - yhat                        # calculate residuals
ssq = np.diag(np.dot(res.T,res))      # sum over the diagonal

is that correct ?

Upvotes: 23

Answers (3)

Pierre Gramme

Reputation: 1254

I'm a bit late to the party, but it's exactly what np.linalg.norm(res) does with its default arguments. This is called the Frobenius norm of the matrix.

Advantage: it avoids allocating a full new matrix for res**2, which should make it faster and with smaller RAM footprint, especially for big matrices.

Upvotes: 2

Ryan Goss

Reputation: 133

Performance comparison follow-up, that fastest method I've found is to do the math directly:

res[:, 0]**2 + res[:, 1]**2 + res[:, 2]**2

import numpy as np
import perfplot

perfplot.live(
    setup=lambda n: np.random.randn(n, 3),
    kernels=[
        lambda a: a[:, 0]**2 + a[:, 1]**2 + a[:, 2]**2,
        lambda a: np.sum(a**2, axis=1),
        lambda a: np.sum(np.square(a), axis=1),
    ],
    labels=["a[:, 0]**2 + a[:, 1]**2 + a[:, 2]**2",
            "np.sum(a**2, axis=1)",
            "np.sum(np.square(a), axis=1)"],
    n_range=[2 ** k for k in range(25)],
    xlabel="len(a)",
)

Shout out to https://github.com/nschloe/perfplot

Upvotes: 4

Dave

Reputation: 680

If you would take the sum of the last array it would be correct. But it's also unnecessarily complex (because the off-diagonal elements are also calculated with np.dot) Faster is:

ssq = np.sum(res**2)

If you want the ssd for each experiment, you can do:

ssq = np.sum(res**2, axis=1)

Upvotes: 47

numpy sum of squares for matrix

Answers (3)

Related Questions