Reputation: 1246
I have a code for previous version of PyTorch and I receive 2 warning for the 3nd line of it:
import torch.nn.functional as F
def select_action(self, state):
probabilities = F.softmax(self.model(Variable(state, volatile = True))*100) # T=100
action = probs.multinomial(num_samples=1)
return action.data[0,0]
UserWarning: volatile was removed and now has no effect. Use
with torch.no_grad():
instead.UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X > as an argument.
I found that:
Volatile is recommended for purely inference mode, when you’re sure you won’t be even calling .backward(). It’s more efficient than any other autograd setting - it will use the absolute minimal amount of memory to evaluate the model. volatile also determines that requires_grad is False.
Am I right that I should just remove it? And because I want to get probabilities therefore should I use dim=1 ? and the 3nd line of my code should look like:
probabilities = F.softmax(self.model(Variable(state), dim=1)*100) # T=100
state is created here:
def update(self, reward, new_signal):
new_state = torch.Tensor(new_signal).float().unsqueeze(0)
self.memory.push((self.last_state, new_state, torch.LongTensor([int(self.last_action)]), torch.Tensor([self.last_reward])))
action = self.select_action(new_state)
if len(self.memory.memory) > 100:
batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(100)
self.learn(batch_state, batch_next_state, batch_reward, batch_action)
self.last_action = action
self.last_state = new_state
self.last_reward = reward
self.reward_window.append(reward)
if len(self.reward_window) > 1000:
del self.reward_window[0]
return action
Upvotes: 2
Views: 918
Reputation: 421
I found same source code written in python 2.7
- "Self Driving Car" application. I wasn't able to install pytorch
/pytorch-cpu
for python 2.7
(CUDA driver issues...) so I had to fix the code to run in python 3.*
.
Here is what I changed to make it work (includes changes suggested above by other people):
update select_action
and learn
functions of Dqn
class like this:
def select_action(self, state):
with torch.no_grad():
probs = F.softmax(self.model(state) * 100, dim=1) # T=100
action = probs.multinomial(num_samples=1)
return action.data[0, 0]
def learn(self, batch_state, batch_next_state, batch_reward, batch_action):
outputs = self.model(batch_state).gather(1, batch_action.unsqueeze(1)).squeeze(1)
next_outputs = self.model(batch_next_state).detach().max(1)[0]
target = self.gamma * next_outputs + batch_reward
td_loss = F.smooth_l1_loss(outputs, target)
self.optimizer.zero_grad()
td_loss.backward()
self.optimizer.step()
Upvotes: 1
Reputation: 24701
You are right but not "fully" right.
Except changes you mentioned you should use torch.no_grad()
as mentioned like this:
def select_action(self, state):
with torch.no_grad():
probabilities = F.softmax(self.model(state), dim=1)*100
action = probs.multinomial(num_samples=1)
return action.data[0,0]
This block turns off autograd engine for code within it (so you save the memory similarly to volatile
).
Also please notice Variable
is deprecated as well (check here) and state
should be simply torch.tensor
created with requires_grad=True
.
BTW. You have probs
and probabilities
but I assume it's the same thing and merely a typo.
Upvotes: 3