DukeLover
DukeLover

Reputation: 2457

RNN Implementation

I am going to implement RNN using Pytorch . But , before that , I am having some difficulties in understanding the character level one-hot encoding which is asked in the question .

Please find below the question

So , For example , I have read a novel in python. Total unique characters is 97. and total characters is somewhere around 300,000 .

So , will my input be 97 x 256 one hot encoded matrix ?

or will it be 300,000 x 256 one hot encoded matrix ?

Upvotes: 0

Views: 164

Answers (1)

macharya
macharya

Reputation: 567

One hot assumes each of your vector should be different in one place. So if you have 97 unique character then i think you should use a 1-hot vector of size ( 97 + 1 = 98). The extra vector maps all the unknown character to that vector. But you can also use a 256 length vector. So you input will be:

B x N x V ( B = batch size, N = no of characters , V = one hot vector size).

But if you are using libraries they usually ask the index of characters in vocabulary and they handle index to one hot conversion. Hope that helps.

Upvotes: 1

Related Questions