Reputation: 21
The following is from a project that I'm doing in Udacity on Deep Learning. The project is on Generating TV scripts. The error that i encountered is the one below. The following function is the one after model training.
def generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len=100):
"""
Generate text using the neural network
param decoder: The PyTorch Module that holds the trained neural network
param prime_id: The word id to start the first prediction
param int_to_vocab: Dict of word id keys to word values
param token_dict: Dict of puncuation tokens keys to puncuation values
param pad_value: The value used to pad a sequence
param predict_len: The length of text to generate
return: The generated text
"""
rnn.eval()
# create a sequence (batch_size=1) with the prime_id
current_seq = np.full((1, sequence_length), pad_value)
current_seq[-1][-1] = prime_id
predicted = [int_to_vocab[prime_id]]
for _ in range(predict_len):
if train_on_gpu:
current_seq = torch.LongTensor(current_seq).cuda()
else:
current_seq = torch.LongTensor(current_seq)
# initialize the hidden state
hidden = rnn.init_hidden(current_seq.size(0))
# get the output of the rnn
output, _ = rnn(current_seq, hidden)
# get the next word probabilities
p = F.softmax(output, dim=1).data
if(train_on_gpu):
p = p.cpu() # move to cpu
# use top_k sampling to get the index of the next word
top_k = 5
p, top_i = p.topk(top_k)
top_i = top_i.numpy().squeeze()
# select the likely next word index with some element of randomness
p = p.numpy().squeeze()
word_i = np.random.choice(top_i, p=p/p.sum())
# retrieve that word from the dictionary
word = int_to_vocab[word_i]
predicted.append(word)
# the generated word becomes the next "current sequence" and the cycle can continue
current_seq = np.roll(current_seq, -1, 1)
current_seq[-1][-1] = word_i
gen_sentences = ' '.join(predicted)
# Replace punctuation tokens
for key, token in token_dict.items():
ending = ' ' if key in ['\n', '(', '"'] else ''
gen_sentences = gen_sentences.replace(' ' + token.lower(), key)
gen_sentences = gen_sentences.replace('\n ', '\n')
gen_sentences = gen_sentences.replace('( ', '(')
# return all the sentences
return gen_sentences
after this the following code is run:
# run the cell multiple times to get different results!
gen_length = 400 # modify the length to your preference
prime_word = 'jerry' # name for starting the script
pad_word = helper.SPECIAL_WORDS['PADDING']
generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
print(generated_script)
Upon running this code, I get the following error
TypeError Traceback (most recent call last)
<ipython-input-40-68a17c4d1704> in <module>()
7 """
8 pad_word = helper.SPECIAL_WORDS['PADDING']
----> 9 generated_script = generate(trained_rnn, vocab_to_int[prime_word + ':'], int_to_vocab, token_dict, vocab_to_int[pad_word], gen_length)
10 print(generated_script)
3 frames
<ipython-input-39-b86c7a305356> in generate(rnn, prime_id, int_to_vocab, token_dict, pad_value, predict_len)
53
54 # the generated word becomes the next "current sequence" and the cycle can continue
---> 55 current_seq = np.roll(current_seq, -1, 1)
56 current_seq[-1][-1] = word_i
57
<__array_function__ internals> in roll(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/numpy/core/numeric.py in roll(a, shift, axis)
1179
1180 """
-> 1181 a = asanyarray(a)
1182 if axis is None:
1183 return roll(a.ravel(), shift, 0).reshape(a.shape)
/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asanyarray(a, dtype, order)
136
137 """
--> 138 return array(a, dtype, copy=False, order=order, subok=True)
139
140
/usr/local/lib/python3.6/dist-packages/torch/tensor.py in __array__(self, dtype)
490 def __array__(self, dtype=None):
491 if dtype is None:
--> 492 return self.numpy()
493 else:
494 return self.numpy().astype(dtype, copy=False)
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Can anyone please help me out?
Upvotes: 2
Views: 12185
Reputation: 32972
np.roll(current_seq, -1, 1)
requires the input to be a NumPy array, but current_seq
is a tensor, so it tries to convert it to a NumPy array, which fails, because the tensor is on the GPU. In order to convert it to a NumPy array, you need to have the tensor on the CPU.
current_seq = np.roll(current_seq.cpu(), -1, 1)
Upvotes: 1