pmdaly
pmdaly

Reputation: 1212

How to vectorize hinge loss gradient computation

I'm computing thousands of gradients and would like to vectorize the computations in Python. The context is SVM and the loss function is Hinge Loss. Y is Mx1, X is MxN and w is Nx1.

 L(w) = lam/2 * ||w||^2 + 1/m Sum i=1:m ( max(0, 1-y[i]X[i]w) )

The gradient of this is

grad = lam*w + 1/m Sum i=1:m {-y[i]X[i].T if y[i]*X[i]*w < 1, else 0}

Instead of looping through each element of the sum and evaluating the max function, is it possible to vectorize this? I want to use something like np.where like the following

grad = np.where(y*X.dot(w) < 1, -X.T.dot(y), 0)

This does not work because where the condition is true, -X.T*y is the wrong dimension.

edit: list comprehension version, would like to know if there's a cleaner or more optimal way

def grad(X,y,w,lam):
    # cache y[i]*X[i].dot(w), each row of Xw is multiplied by a single element of y
    yXw = y*X.dot(w)
    # cache y[i]*X[i], note each row of X is multiplied by a single element of y
    yX = X*y[:,np.newaxis]
    # return the average of this max function
    return lam*w + np.mean( [-yX[i] if yXw[i] < 1 else 0 for i in range(len(y))] )

Upvotes: 1

Views: 5178

Answers (1)

lejlot
lejlot

Reputation: 66805

you have two vectors A and B, and you want to return array C, such that C[i] = A[i] if B[i] < 1 and 0 else, consequently all you need to do is

C := A * sign(max(0, 1-B)) # suprisingly similar to the original hinge loss, right?:)

since

  • if B < 1 then 1-B > 0, thus max(0, 1-B) > 0 and sign(max(0, 1-B)) == 1
  • if B >= 1 then 1-B <= 0, thus max(0, 1-B) = 0 and sign(max(0, 1-B)) == 0

so in your code it will be something like

A = (y*X.dot(w)).ravel()
B = (X*y[:,np.newaxis]).ravel()
C = A * np.sign(np.maximum(0, 1-B))

Upvotes: 2

Related Questions