Reputation: 5418
I do have a matrix with observations in rows (measurements at differnt pH) with data points as columns (concentration over time). So one row consists of differnt data points for one pH.
I do want to fit an ODE to the data. So i defined a cost function and would like to calculate the sum of squares for all observatoins. Taking the sum of sqares for this matrix should work like:
res = y - yhat # calculate residuals
ssq = np.diag(np.dot(res.T,res)) # sum over the diagonal
is that correct ?
Upvotes: 23
Views: 89684
Reputation: 1254
I'm a bit late to the party, but it's exactly what np.linalg.norm(res)
does with its default arguments. This is called the Frobenius norm of the matrix.
Advantage: it avoids allocating a full new matrix for res**2
, which should make it faster and with smaller RAM footprint, especially for big matrices.
Upvotes: 2
Reputation: 133
Performance comparison follow-up, that fastest method I've found is to do the math directly:
res[:, 0]**2 + res[:, 1]**2 + res[:, 2]**2
import numpy as np
import perfplot
perfplot.live(
setup=lambda n: np.random.randn(n, 3),
kernels=[
lambda a: a[:, 0]**2 + a[:, 1]**2 + a[:, 2]**2,
lambda a: np.sum(a**2, axis=1),
lambda a: np.sum(np.square(a), axis=1),
],
labels=["a[:, 0]**2 + a[:, 1]**2 + a[:, 2]**2",
"np.sum(a**2, axis=1)",
"np.sum(np.square(a), axis=1)"],
n_range=[2 ** k for k in range(25)],
xlabel="len(a)",
)
Shout out to https://github.com/nschloe/perfplot
Upvotes: 4
Reputation: 680
If you would take the sum of the last array it would be correct. But it's also unnecessarily complex (because the off-diagonal elements are also calculated with np.dot) Faster is:
ssq = np.sum(res**2)
If you want the ssd for each experiment, you can do:
ssq = np.sum(res**2, axis=1)
Upvotes: 47