Reputation: 429
I'm building a text classifier using pytorch, and got into some trouble with .cuda() method. I know that .cuda() moves all parameters into gpu so that the training procedure can be faster. However, error occurred in .cuda() method like this:
start_time = time.time()
for model_type in ('lstm',):
hyperparam_combinations = score_util.all_combination(hyperparam_dict[model_type].values())
# for selecting best scoring model
for test_idx, setting in enumerate(hyperparam_combinations):
args = custom_dataset.list_to_args(setting,model_type=model_type)
print(args)
tsv = "test %d\ttrain_loss\ttrain_acc\ttrain_auc\tval_loss\tval_acc\tval_auc\n"%(test_idx) # tsv record
avg_score = [] # cv_mean score
### 4 fold cross validation
for cv_num,(train_iter,val_iter) in enumerate(cv_splits):
### model initiation
model = model_dict[model_type](args)
if args.emb_type is not None: # word embedding init
emb = emb_dict[args.emb_type]
emb = score_util.embedding_init(emb,tr_text_field,args.emb_type)
model.embed.weight.data.copy_(emb)
model.cuda()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-20-ff6cfce73c10> in <module>()
23 model.embed.weight.data.copy_(emb)
24
---> 25 model.cuda()
26
27 optimizer= torch.optim.Adam(model.parameters(),lr=args.lr)
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in cuda(self, device_id)
145 copied to that device
146 """
--> 147 return self._apply(lambda t: t.cuda(device_id))
148
149 def cpu(self, device_id=None):
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
116 def _apply(self, fn):
117 for module in self.children():
--> 118 module._apply(fn)
119
120 for param in self._parameters.values():
~\Anaconda3\lib\site-packages\torch\nn\modules\module.py in _apply(self, fn)
122 # Variables stored in modules are graph leaves, and we don't
123 # want to create copy nodes, so we have to unpack the data.
--> 124 param.data = fn(param.data)
125 if param._grad is not None:
126 param._grad.data = fn(param._grad.data)
RuntimeError: Variable data has to be a tensor, but got torch.cuda.FloatTensor
These are error traceback and I can't see why this happens. This code worked very well before I set epoch parameter to 1 to run some tests. I set epoch to 1000 again, but the problem lingers on. Aren't torch.cuda.FloatTensor object also Tensors? Any help would be much appreciated.
my model looks like this :
class TR_LSTM(nn.Module):
def __init__(self,args,
use_hidden_average=False,
pretrained_emb = None):
super(TR_LSTM,self).__init__()
# arguments
self.emb_dim = args.embed_dim
self.emb_num = args.embed_num
self.num_hidden_unit = args.hidden_state_dim
self.num_lstm_layer = args.num_lstm_layer
self.use_hidden_average = use_hidden_average
self.batch_size = args.batch_size
# layers
self.embed = nn.Embedding(self.emb_num, self.emb_dim)
if pretrained_emb is not None:
self.embed.weight.data.copy_(pretrained_emb)
self.lstm_layer = nn.LSTM(self.emb_dim, self.num_hidden_unit, self.num_lstm_layer, batch_first = True)
self.fc_layer = nn.Sequential(nn.Linear(self.num_hidden_unit,self.num_hidden_unit),
nn.Linear(self.num_hidden_unit,2))
def forward(self,x):
x = self.embed(x) # batch * max_seq_len * emb_dim
h_0,c_0 = self.init_hidden(x.size(0))
x, (_, _) = self.lstm_layer(x, (h_0,c_0)) # batch * seq_len * hidden_unit_num
if not self.use_hidden_average:
x = x[:,x.size(1)-1,:]
x = x.squeeze(1)
else:
x = x.mean(1).squeeze(1)
x = self.fc_layer(x)
return x
def init_hidden(self,batch_size):
h_0, c_0 = torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit),\
torch.zeros(self.num_lstm_layer,batch_size , self.num_hidden_unit)
h_0, c_0 = h_0.cuda(), c_0.cuda()
h_0_param, c_0_param = torch.nn.Parameter(h_0), torch.nn.Parameter(c_0)
return h_0_param, c_0_param
Upvotes: 1
Views: 3102
Reputation: 118
model.cuda() is called inside your training/test loop, which is the problem. As the error message suggests, you repeatedly convert parameters(tensors) in your model to cuda, which is not the right way to convert model into cuda tensor.
model object should be created and cuda-ize outside the loop. Only training/test instances shall be convert to cuda tensor every time you feed your model. I also suggest you read examples code from pytorch document site.
Upvotes: 4