How to: CNTK C# LSTM classifier of free text (NLP) using word Word2Vec embeddings

Question

I am new to CNTK. My environment is C# (unfortunately, I am not a python or a BrainScript programmer).

I am trying to use CNTK to design/train/test an LSTM on free text (NLP) to select an appropriate title (from a given set of titles, about 8,000 of them in my data).

I've used a separate program to map each word into a 100-element vector of real numbers (the 100 is a configurable value; my non-CNTK program, GloVe, can generate any width I select).

My raw input looks something like:

|label 17 |features the brown fox jumped over the ...
|label 19 |features there comes a time when all ...
...

Where '17' is a shorthand for the 17-th title and really is a hot-one representation: [0, 0, ..., 1, 0, 0, ...] where the '1' is in the 17-th position.

Each input row is a sequence of words (separated by a space) - the typical length is a few hundred words, but some data (rows) have thousands of words in it.

My issue is that I don't know how to insert a run-time transformation from my raw file format into something CNTK could use.

I can't assume in-memory data since in production we will be training on data that has millions of rows.

In each mini batch:

The '17' (in the example above) needs to be translated to [0, ..., 1, 0, ...].

Each word needs to be translated (via a lookup into C# Dictionary) into an array (of 100) real numbers.

I realize this is the Embedding layer in CNTK's LSTM but I cannot find any tutorial/example (especially in C#) of how to add a transformation layer using a non-hot-one embedding.

For all its worth, my template for doing this in C# is the LSTMSequenceClassifier.cs in the CNTK examples.

Link to CNTK example: https://github.com/Microsoft/CNTK/blob/master/Examples/TrainingCSharp/Common/LSTMSequenceClassifier.cs

Any help would be greatly appreciated. I've racked my brains on this for the past week!

How to: CNTK C# LSTM classifier of free text (NLP) using word Word2Vec embeddings

Answers (1)

Related Questions