Sentiment Analysis with a LSTM for Youtube comments using Keras

Question

I'm aiming to get my hands dirty by slowly scaling using LSTMs. However in the initial stages now, I'm trying to implement a Youtube LSTM sentiment analyzer using Keras. While searching for the resources available to aid me, I came across the IMDB sentiment analysis dataset and LSTM code. While it works great for longer inputs, shorter inputs don't do so well. The code is here at https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py

Upon saving the Keras model and building a prediction module for this data with this code

 model = load_model('ytsentanalysis.h5')
 print("Enter text")
 text=input()
 list=text_to_word_sequence(text,filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~	
',lower=True,split=" ")
 print(list)
 word_index = imdb.get_word_index()
 x_test = [[word_index[w] for w in list if w in word_index]]
 prediction=model.predict(x_test)
 print(prediction)

I feed in various inputs such as 'bad video' 'fantastic amazing' or 'good great' 'terrible bad'. The outputs range from close to 1 for similarly bad themed inputs and I've seen a 0.3ish prediction for a good themed input. I'd expect it should be closer to 1 for positive and closer to 0 for negative.

In an effort to solve this problem, I limited maxlen=20 while training and predicting because Youtube comments are much shorter, with the same code run again. This time the probabilities during prediction were all e^insert large negative power here

Is there no way I can adapt and reuse the existing dataset? If not, since labeled Youtube comment datasets aren't as extensive, should I use something like a Twitter comment dataset at the expense of losing the efficiency of the pre-built IMDB input modules in Keras? And is there any way I can see the code for those modules?

Thank you in advance for answering all these questions.

Sentiment Analysis with a LSTM for Youtube comments using Keras

Answers (1)

Related Questions