Reputation: 43
If I have to use pretrained word vectors as embedding layer in Neural Networks (eg. say CNN), How do I deal with index 0?
Detail:
We usually start with creating a zero numpy 2D array. Later we fill in the indices of words from the vocabulary. The problem is, 0 is already the index of another word in our vocabulary (say, 'i' is index at 0). Hence, we are basically initializing the whole matrix filled with 'i' instead of empty words. So, how do we deal with padding all the sentences of equal length?
One easy pop-up in mind is we can use the another digit=numberOfWordsInVocab+1 to pad. But wouldn't that take more size? [Help me!]
Upvotes: 2
Views: 402
Reputation: 1528
If I have to use pretrained word vectors as embedding layer in Neural Networks (eg. say CNN), How do I deal with index 0?
Answer
In general, empty entries can be handled via a weighted cost of the model and the targets. However, when dealing with words and sequential data, things can be a little tricky and there are several things that can be considered. Let's make some assumptions and work with that.
Assumptions
max_lenght
words.Details
n
row for n
sequential words). So the lookup (hash) table is significantly faster, and it selects rows from the embedding matrix (for row vectors).Task
Suggestions
i=i+1
and the embedding matrix should have new row at position 0).max_length
. That is, if we have a sequence of word tokens [0,5,6,2,178,24,0,NaN,NaN]
, the corresponding weight vector is [1,1,1,1,1,1,1,0,0]
1
vs N
words, N
is large). In complexity, it is something that can be later incorporated in the initial tokenize function. The predictions and model complexity is a larger issue and more important requirement from the system.Upvotes: 0
Reputation: 198
One easy pop-up in mind is we can use the another digit=numberOfWordsInVocab+1 to pad. But wouldn't that take more size?
Nope! That's the same size.
a=np.full((5000,5000), 7)
a.nbytes
200000000
b=np.zeros((5000,5000))
b.nbytes
200000000
Edit: Typo
Upvotes: 1