dnth
dnth

Reputation: 897

Testing the Keras sentiment classification with model.predict

I have trained the imdb_lstm.py on my PC. Now I want to test the trained network by inputting some text of my own. How do I do it? Thank you!

Upvotes: 2

Views: 3146

Answers (2)

Lior Magen
Lior Magen

Reputation: 1573

So what you basically need to do is as follows:

  1. Tokenize sequnces: convert the string into words (features): For example: "hello my name is georgio" to ["hello", "my", "name", "is", "georgio"].
  2. Next, you want to remove stop words (check Google for what stop words are).
  3. This stage is optional, it may lead to faulty results but I think it worth a try. Stem your words (features), that way you'll reduce the number of features which will lead to a faster run. Again, that's optional and might lead to some failures, for example: if you stem the word 'parking' you get 'park' which has a different meaning.
  4. Next thing is to create a dictionary (check Google for that). Each word gets a unique number and from this point we will use this number only.
  5. Computers understand numbers only so we need to talk in their language. We'll take the dictionary from stage 4 and replace each word in our corpus with its matching number.
  6. Now we need to split our data set to two groups: training and testing sets. One (training) will train our NN model and the second (testing) will help us to figure out how good is our NN. You can use Keras' cross validation function.
  7. Next thing is defining whats the max number of features our NN can get as an input. Keras call this parameter - 'maxlen'. But you don't really have to do this manually, Keras can do that automatically just by searching for the longest sentence you have in your corpus.
  8. Next, let's say that Keras found out that the longest sentence in your corpus has 20 words (features) and one of your sentences is the example in the first stage, which its length is 5 (if we'll remove stop words it'll be shorter), in such case we'll need to add zeros, 15 zeros actually. This is called pad sequence, we do that so every input sequence will be in the same length.

Upvotes: 8

This might help. http://keras.io/models/

Here is an sample usage. How to use keras for XOR

Probably you have to convert ur corpus into ndarray first and throw it to your model.predict

From what it seem so far the model.predict input of the training model should be 100 words corpus which represent an index of each word in dictionary. So if you want to train it with ur corpus, you have to convert ur corpus according to those dictionary and see if the result is 0 or 1

Upvotes: 1

Related Questions