Ryan
Ryan

Reputation: 8241

Using vectorize to apply function to each row in Numpy 2d array

I have a 1000x784 matrix of data (10000 examples and 784 features) called X_valid and I'd like to apply the following function to each row in this matrix and get the numerical result:

def predict_prob(x_valid, cov, mean, prior):
    return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(
    np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(
    prior)

(x_valid is simply a row of data). I'm using numpy's vectorize to do this with the following code:

v_predict_prob = np.vectorize(predict_prob)
scores = v_predict_prob(X_valid, covariance[num], means[num], priors[num])

(covariance[num], means[num], and priors[num] are just constants.)

However, I get the following error when running this:

File "problem_5.py", line 48, in predict_prob
return -0.5 * (x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + mean.T.dot(np.linalg.inv(cov)).dot(mean) + np.linalg.slogdet(cov)[1]) + np.log(prior)
AttributeError: 'numpy.float64' object has no attribute 'dot'

That is, it's not passing in each row of the matrix individually. Instead, it is passing in each entry of the matrix (not what I want).

How can I alter this to get the desired behavior?

Upvotes: 6

Views: 3168

Answers (3)

Andrew Pye
Andrew Pye

Reputation: 622

I know this question is a bit outdated, but I thought I would provide an answer for 2020. Since the release of numpy 1.12, there is a new optional argument, "signature", which should allow 2D array functionality in most cases. Additionally, you will want to "exclude" the constants since they will not be vectorized.

All you would need to change is:

v_predict_prob = np.vectorize(predict_prob, exclude=['cov', 'mean', 'prior'], signature='(n)->()')

This signifies that the function should expect an n-dim array and output a scalar, and cov, mean, and prior will not be vectorized.

Upvotes: 1

hpaulj
hpaulj

Reputation: 231375

vectorize is NOT a general substitute for iteration, nor does it claim to be faster. It mainly streamlines access to the numpy broadcasting functionality. In general the function that you vectorize will take scalar inputs, not rows or 1d arrays.

I don't think there is a way of configuring vectorize to pass an array to your function as opposed to an item.

You describe x_valid as 2d that you want to evaluate row by row. And the other terms as 'constants' which you select with [num]. What shape are those constants?

You function treats a lot of these terms as 2d arrays:

x_valid.T.dot(np.linalg.inv(cov)).dot(x_valid) + 
mean.T.dot(np.linalg.inv(cov)).dot(mean) + 
np.linalg.slogdet(cov)[1]) + np.log(prior)

x_valid.T is meaningful only if x_valid is 2d. If it is 1d, the transpose does noting.

np.linalg.inv(cov) only makes sense if cov is 2d.

mean.T.dot... assumes mean is 2d.

np.linalg.slogdet(cov)[1] assumes np.linalg.slogdet(cov) has 2 or more elements (or rows).

You need to show us that the function works with some real arrays before jumping into iteration or 'vectorize'.

Upvotes: 2

Bob Baxley
Bob Baxley

Reputation: 3751

I suggest just using a for loop:

def v_predict_prob(X_valid, c, m, p):
    out = []
    for row in X_valid:
        out.append(predict_prob(row, c, m, p))
    return np.array(out)

Under the hood np.vectorize is doing the same thing: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.vectorize.html

Upvotes: 1

Related Questions