Reputation: 1926
Say that, for some reason, I want to fit a linear regression using PyTorch, as illustrated below.
How could I compute the Hessian matrix of the model to, ultimately, compute the standard error for my parameter estimates?
import torch
import torch.nn as nn
# set seed
torch.manual_seed(42)
# define the model
class OLS_pytorch(nn.Module):
def __init__(self, X, Y):
super(OLS_pytorch, self).__init__()
self.X = X
self.Y = Y
self.beta = nn.Parameter(torch.ones(X.shape[1], 1, requires_grad=True))
self.intercept = nn.Parameter(torch.ones(1, requires_grad=True))
self.loss = nn.MSELoss()
def forward(self):
return self.X @ self.beta + self.intercept
def fit(self, lr=0.01, epochs=1000):
optimizer = torch.optim.Adam(self.parameters(), lr=lr)
for epoch in range(epochs):
optimizer.zero_grad()
loss = self.loss(self.forward(), self.Y)
loss.backward()
optimizer.step()
if epoch % 10 == 0:
print(f"Epoch {epoch} loss: {loss.item()}")
return self
Generating some data and using the model
# Generate some data
X = torch.randn(100, 1)
Y = 2 * X + 3 + torch.randn(100, 1)
# fit the model
model = OLS_pytorch(X, Y)
model.fit()
#extract parameters
model.beta, model.intercept
#Epoch 980 loss: 0.7803605794906616
#Epoch 990 loss: 0.7803605794906616
#(Parameter containing:
# tensor([[2.0118]], requires_grad=True),
# Parameter containing:
# tensor([3.0357], requires_grad=True))
For instance, in R, using the same data and the lm()
function, I recover the same parameters, but I am also able to recover the Hessian matrix, and them I am able to compute standard errors.
ols <- lm(Y ~ X, data = xy)
ols$coefficients
#(Intercept) X
# 3.035674 2.011811
vcov(ols)
# (Intercept) X
# (Intercept) 0.0079923921 -0.0004940884
# X -0.0004940884 0.0082671053
summary(ols)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 3.03567 0.08940 33.96 <2e-16 ***
# X 2.01181 0.09092 22.13 <2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
UPDATE: Using the answer from @cherrywoods
here is how you would match the standard errors produced by lm()
in R
# predict
y_pred = model.X @ model.beta + model.intercept
sigma_hat = torch.sum((y_pred - model.Y)**2)/ (N-2) #2 is the number of estimated parameters.
from torch.autograd.functional import hessian
def loss(beta, intercept):
y_pred = model.X @ beta + intercept
return model.loss(y_pred, model.Y)
H = torch.Tensor(hessian(loss, (model.beta, model.intercept)))
vcov = torch.sqrt(torch.diag(sigma_hat*torch.inverse(H/2)) )
print(vcov)
#tensor([0.9092, 0.8940], grad_fn=<SqrtBackward0>)
Upvotes: 2
Views: 646
Reputation: 1361
You can compute the Hessian using torch.autograd.functional.hessian.
from torch.autograd.functional import hessian
def loss(beta, intercept):
y_pred = model.X @ beta + intercept
return model.loss(y_pred, model.Y)
H = hessian(loss, (model.beta, model.intercept))
Upvotes: 1