A. Arfaoui
A. Arfaoui

Reputation: 43

Neural net with duplicated inputs - Keras

I have a dataset of N videos each video is characterized by some metrics (that will be inputs for a neural net) my goal is to predict the score that a person will give when he or she watches the video.

The problem is that in my dataset each video was watched more than once by different subjects, so I was forced to duplicate the same metrics (inputs) the number of time the video was watched to keep all the scores given by the subjects.

I built an MLP model to predictet the scores. But when I calculate the RMSE it's always higher than 0.7.

I want to know if having a dataset like that would affect the performance of my model ? And how can I deal with it ?

Here is how the dataset looks like:

The first 5 columns are the inputs and the last one is the score of subjects. Note that all of them are normalized.

the first 5 columns are the inputs and the last one is the score of subjects. Note that all of them are normalized

Here is my Model:

 def mlp_model():
    # create model
    model = Sequential()
    model.add(Dense(100,input_dim=5, kernel_initializer='normal', activation='relu'))
    model.add(Dense(100, kernel_initializer='normal', activation='relu'))
    model.add(Dense(100, kernel_initializer='normal', activation='relu'))
    model.add(Dense(100, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam')
    return model

    seed = 100 
    numpy.random.seed(seed)
    myModel = mlp_model()
    myModel.fit(x=x_train, y=y_train, batch_size=10, epochs=45, validation_split=0.3, shuffle=True,callbacks=[plot_losses])
    predictions = myModel.predict(x_test)
    print predictions

Upvotes: 0

Views: 851

Answers (1)

dennlinger
dennlinger

Reputation: 11450

Your problem statement reveals an inherent flaw in the design. As you correctly pointed out, you have no way of knowing what the user does, how she has rated other videos, and how she will rate the current video.

It would be helpful to explain what your current input values are, and whether they could differ at all. For example, a metric like "time spent watching the video" might be different for different users.

On a larger scale, try to answer the question whethre you could answer the rating (with a completely deterministic judgement), i.e. would it be possible for you to come up with the same answer (given the same input), and constantly get the same result?

Since that is currently not the case, I would say that you should investigate more time in finding a suitable approach to your problem, like for example recommender systems, but that also requires you to use a lot of different input information.

Alternatively, you could try to find more input data, which specifically identifies the users, and allows you to make more suitable predictions; even then, it will be hard to base a reasonable prediction on such proxy metrics, since you might end up creating an unwanted bias in your preprocessing.

In any case, getting much better results with the current format of the input is very unlikely.

Upvotes: 1

Related Questions