How do I use the word vector returned by word2vec as features?

Question

I am planning to use Multi Layer Perceptron Classifier from Scikit Learn for this purpose.
Output is the Gender of that word which shall be represented in a one-hot encoding like [1,0,0] for male, [0, 1, 0] for female and [0, 0, 1] for female. Now one of the inputs is the word vector for the word. Each of these vectors has 20 dimensions. The other features are it's Part Of Speech Tags and Singularity(0)/Plurality(1) state. My question is how do I use the word vector which is an array as a feature in MLPClassifier?

cs95 · Accepted Answer

Your w2v vector captures some semantic similarity with respect to the word. This vector must be considered a whole - it is a feature in itself.

One nice attribute of neural networks are their capability of extracting and learning patterns on their own. As input, you could consider concatenating the word vector along with a vectorised/numerical equivalent of the POS tag, and finally the singularity state:

------------------- ----  -   
\_________________/ \__/  |     } ------ 25d vector input to the MLP (assuming your POS takes 4 spaces)
     w2v vector      POS state

As long as you follow a consistent scheme with the training, testing, and unseen data, your MLP will use the entire input to automatically extract features from the input as it learns.

How do I use the word vector returned by word2vec as features?

Answers (1)

Related Questions