Reputation: 111
I have an LSTM model that takes 3 sequences of temperature data and outputs the next sequence.
input => [array([0.20408163, 0.40816327, 0.6122449 ]),
array([0.40816327, 0.6122449 , 0.81632653])]
output=> [tensor(0.81632653, dtype=torch.float64),
tensor(0.91667510, dtype=torch.float64)]
Now,I want to combine this LSTM model with a Physics-Informed Neural Network (PINN) based on Newton’s Law of Cooling. The idea is to predict temperature using LSTM, then calculate the derivative of the predicted temperature with respect to time to incorporate the physics law into the loss function.
However, when I try to compute the gradient of the LSTM output with respect to time (t), the gradient returned is None. I’m not sure if I’m using torch.autograd correctly for this purpose.
Here is a simplified version of my code:
import torch
import torch.nn as nn
def create_lstm_model(input_size, hidden_size, num_layers, output_size):
class LSTMModel(nn.Module):
def __init__(self, input_size, hidden_size, num_layers, output_size):
super(LSTMModel, self).__init__()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out
return LSTMModel(input_size, hidden_size, num_layers, output_size)
def physics_loss_autograd(outputs, time_step):
Compute the physics-informed loss using autograd to get dT/dt.
# Compute dT/dt using autograd
dT_dt = torch.autograd.grad(outputs, time_step, grad_outputs=torch.ones_like(outputs), create_graph=True)[0]
# Newton's law of cooling: dT/dt = -k(T - T_ambient)
residual = dT_dt + k * (outputs - T_ambient)
# Physics loss is the L2 norm of the residual
physics_loss = torch.mean(residual**2)
return physics_loss
t = torch.arange(0,100)
input_size = 1
hidden_size = 64
num_layers = 1
output_size = 1
# Create an instance of the LSTMModel using the function
model = create_lstm_model(input_size, hidden_size, num_layers, output_size)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
num_epochs = 100
for epoch in range(num_epochs):
total_loss = 0
for inputs, targets in train_loader:
inputs, targets = inputs.float(), targets.float() # Convert to float
# print(inputs.shape)
outputs = model(inputs)
data_loss = criterion(outputs, targets)
phys_loss = physics_loss_autograd(outputs, t)
loss = data_loss + phys_loss
total_loss += loss.item()
if (epoch+1) % 20 == 0:
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {total_loss/len(train_loader)}')
Has anyone worked on a similar problem? Any guidance on how to compute the temporal derivative of the LSTM output would be really helpful!
Additional Information:
The problem arises when I try to compute the physics-informed loss via torch.autograd.grad:
dT_dt = torch.autograd.grad(outputs, t, grad_outputs=torch.ones_like(outputs), create_graph=True)[0]
This returns None for dT_dt. I suspect there’s an issue with how I’m handling the time_step or the autograd setup, but I’m not sure what exactly is going wrong.
Upvotes: 0
Views: 119
Reputation: 5473
Pytorch computes gradients using autograd. Autograd works by tracking the computations in the forward pass in a computational graph, then traversing that computational graph backward to compute gradients.
This means that computing gradients between two values requires those values to be linked in the computational graph. Conversely, Pytorch cannot compute gradients between variables that are not linked by a computational graph. A simple example:
a = torch.randn(8, requires_grad=True)
b = torch.randn(8, requires_grad=True)
c = torch.randn(8, requires_grad=True)
y = a*b
dy_da = torch.autograd.grad(y, a, grad_outputs=torch.ones_like(y), retain_graph=True)[0]
dy_db = torch.autograd.grad(y, b, grad_outputs=torch.ones_like(y), retain_graph=True)[0]
dy_dc = torch.autograd.grad(y, c, grad_outputs=torch.ones_like(y), retain_graph=True, allow_unused=True)[0]
In the above, dy_da
and dy_db
are tensors, while dy_dc
is None
. This is because c
does not participate in the computational graph that produces y
- there is no computation linking c
to y
For your example, your values t
do not participate in the computation of your outputs
, therefore it is not possible to compute the gradient of your outputs with respect to t
If you want this to be possible, you need to design your model such that t
is an input to the model function that generates outputs
Upvotes: 1