Kosh
Kosh

Reputation: 1246

softmax dims and variable volatile in PyTorch

I have a code for previous version of PyTorch and I receive 2 warning for the 3nd line of it:

import torch.nn.functional as F

def select_action(self, state):
        probabilities = F.softmax(self.model(Variable(state, volatile = True))*100) # T=100
        action = probs.multinomial(num_samples=1)
        return action.data[0,0]

UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.

UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X > as an argument.

I found that:

Volatile is recommended for purely inference mode, when you’re sure you won’t be even calling .backward(). It’s more efficient than any other autograd setting - it will use the absolute minimal amount of memory to evaluate the model. volatile also determines that requires_grad is False.

Am I right that I should just remove it? And because I want to get probabilities therefore should I use dim=1 ? and the 3nd line of my code should look like:

    probabilities = F.softmax(self.model(Variable(state), dim=1)*100) # T=100

state is created here:

def update(self, reward, new_signal):
   new_state = torch.Tensor(new_signal).float().unsqueeze(0)
   self.memory.push((self.last_state, new_state, torch.LongTensor([int(self.last_action)]), torch.Tensor([self.last_reward])))
   action = self.select_action(new_state)
   if len(self.memory.memory) > 100:
       batch_state, batch_next_state, batch_action, batch_reward = self.memory.sample(100)
       self.learn(batch_state, batch_next_state, batch_reward, batch_action)
   self.last_action = action
   self.last_state = new_state
   self.last_reward = reward
   self.reward_window.append(reward)
   if len(self.reward_window) > 1000:
       del self.reward_window[0]
   return action

Upvotes: 2

Views: 918

Answers (2)

Vlad D.
Vlad D.

Reputation: 421

I found same source code written in python 2.7 - "Self Driving Car" application. I wasn't able to install pytorch/pytorch-cpu for python 2.7 (CUDA driver issues...) so I had to fix the code to run in python 3.* .

Here is what I changed to make it work (includes changes suggested above by other people): update select_action and learn functions of Dqn class like this:

    def select_action(self, state):
        with torch.no_grad():
            probs = F.softmax(self.model(state) * 100, dim=1)  # T=100
            action = probs.multinomial(num_samples=1)
            return action.data[0, 0]

    def learn(self, batch_state, batch_next_state, batch_reward, batch_action):
        outputs = self.model(batch_state).gather(1, batch_action.unsqueeze(1)).squeeze(1)
        next_outputs = self.model(batch_next_state).detach().max(1)[0]
        target = self.gamma * next_outputs + batch_reward
        td_loss = F.smooth_l1_loss(outputs, target)
        self.optimizer.zero_grad()
        td_loss.backward()
        self.optimizer.step()

Upvotes: 1

Szymon Maszke
Szymon Maszke

Reputation: 24701

You are right but not "fully" right.

Except changes you mentioned you should use torch.no_grad() as mentioned like this:

def select_action(self, state):
    with torch.no_grad():
        probabilities = F.softmax(self.model(state), dim=1)*100
        action = probs.multinomial(num_samples=1)
        return action.data[0,0]

This block turns off autograd engine for code within it (so you save the memory similarly to volatile).

Also please notice Variable is deprecated as well (check here) and state should be simply torch.tensor created with requires_grad=True.

BTW. You have probs and probabilities but I assume it's the same thing and merely a typo.

Upvotes: 3

Related Questions