Peter Kim
Peter Kim

Reputation: 429

pytorch model.cuda() runtime error

I'm building a text classifier using pytorch, and got into some trouble with .cuda() method. I know that .cuda() moves all parameters into gpu so that the training procedure can be faster. However, error occurred in .cuda() method like this:

start_time  = time.time()

for model_type in ('lstm',):

    hyperparam_combinations = score_util.all_combination(hyperparam_dict[model_type].values())
     # for selecting best scoring model

    for test_idx, setting in enumerate(hyperparam_combinations):
        args = custom_dataset.list_to_args(setting,model_type=model_type)
        print(args)
        tsv = "test %d\ttrain_loss\ttrain_acc\ttrain_auc\tval_loss\tval_acc\tval_auc\n"%(test_idx) # tsv record
        avg_score = [] # cv_mean score

        ### 4 fold cross validation
        for cv_num,(train_iter,val_iter) in enumerate(cv_splits):

            ### model initiation
            model = model_dict[model_type](args)

            if args.emb_type is not None: # word embedding init
                emb = emb_dict[args.emb_type]
                emb = score_util.embedding_init(emb,tr_text_field,args.emb_type)
                model.embed.weight.data.copy_(emb)

            model.cuda()

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-20-ff6cfce73c10> in <module>()
     23                 model.embed.weight.data.copy_(emb)
     24 
---> 25             model.cuda()
     26 
     27             optimizer= torch.optim.Adam(model.parameters(),lr=args.lr)

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in cuda(self, device_id)
    145                 copied to that device
    146         """
--> 147         return self._apply(lambda t: t.cuda(device_id))
    148 
    149     def cpu(self, device_id=None):

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    116     def _apply(self, fn):
    117         for module in self.children():
--> 118             module._apply(fn)
    119 
    120         for param in self._parameters.values():

~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
    122                 # Variables stored in modules are graph leaves, and we don't
    123                 # want to create copy nodes, so we have to unpack the data.
--> 124                 param.data = fn(param.data)
    125                 if param._grad is not None:
    126                     param._grad.data = fn(param._grad.data)

RuntimeError: Variable data has to be a tensor, but got torch.cuda.FloatTensor

These are error traceback and I can't see why this happens. This code worked very well before I set epoch parameter to 1 to run some tests. I set epoch to 1000 again, but the problem lingers on. Aren't torch.cuda.FloatTensor object also Tensors? Any help would be much appreciated.

my model looks like this :

class TR_LSTM(nn.Module):
    def __init__(self,args,
                 use_hidden_average=False,
                 pretrained_emb = None):

        super(TR_LSTM,self).__init__()
        # arguments
        self.emb_dim = args.embed_dim
        self.emb_num = args.embed_num
        self.num_hidden_unit = args.hidden_state_dim
        self.num_lstm_layer = args.num_lstm_layer
        self.use_hidden_average = use_hidden_average
        self.batch_size = args.batch_size

        # layers
        self.embed = nn.Embedding(self.emb_num, self.emb_dim)
        if pretrained_emb is not None:
            self.embed.weight.data.copy_(pretrained_emb)

        self.lstm_layer = nn.LSTM(self.emb_dim, self.num_hidden_unit, self.num_lstm_layer, batch_first = True)
        self.fc_layer = nn.Sequential(nn.Linear(self.num_hidden_unit,self.num_hidden_unit),
                                      nn.Linear(self.num_hidden_unit,2))

    def forward(self,x):
        x = self.embed(x) # batch * max_seq_len * emb_dim
        h_0,c_0 = self.init_hidden(x.size(0))
        x, (_, _) = self.lstm_layer(x, (h_0,c_0)) # batch * seq_len * hidden_unit_num

        if not self.use_hidden_average:
            x = x[:,x.size(1)-1,:]
            x = x.squeeze(1)
        else:
            x = x.mean(1).squeeze(1)
        x = self.fc_layer(x)

        return x


    def init_hidden(self,batch_size):
        h_0, c_0 = torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit),\
                   torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit)
        h_0, c_0 = h_0.cuda(), c_0.cuda()
        h_0_param, c_0_param = torch.nn.Parameter(h_0), torch.nn.Parameter(c_0)
        return h_0_param, c_0_param

Upvotes: 1

Views: 3102

Answers (1)

Jon
Jon

Reputation: 118

model.cuda() is called inside your training/test loop, which is the problem. As the error message suggests, you repeatedly convert parameters(tensors) in your model to cuda, which is not the right way to convert model into cuda tensor.

model object should be created and cuda-ize outside the loop. Only training/test instances shall be convert to cuda tensor every time you feed your model. I also suggest you read examples code from pytorch document site.

Upvotes: 4

Related Questions