Spyros Spyropoulos
Spyros Spyropoulos

Reputation: 31

Manual predictions of neural net go wrong

I have a dataset (csv) with the format shown bellow:

First column: random integers

Second column: The class of each integer (called bins)

enter image description here

Bins have been made after preprocessing,for exampe integers between 1000 and 1005 belong in bin number 0 , 1006 and 1011 beongs in bin number 1 and go on.

Target column for my neural network is the column of bins (second column).

I use OneHotEncoding for my target column and transform every bin number to a binary vector. I have 3557 different bins (classes).

I trained it and evaluate it with accurancy 99,7% as a result.

import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import OneHotEncoder
from keras import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split

df = pd.read_csv("/dbfs/FileStore/tables/export78.csv")

onehotencoder = OneHotEncoder(categorical_features = [1])
data2 = onehotencoder.fit_transform(df).toarray()
dataset = pd.DataFrame(data2)

X= dataset.iloc[:,3557].astype(float)
y= dataset.iloc[:,0:3557].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)


classifier = Sequential()
#First Hidden Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal', input_dim=1))
#Second  Hidden Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal'))
#Output Layer
classifier.add(Dense(3557, activation='sigmoid', kernel_initializer='random_normal'))

#Compiling the neural network
classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics=['accuracy'])

#Fitting the data to the training dataset
classifier.fit(X_train,y_train, batch_size=50, epochs=10)

accr = classifier.evaluate(X_test, y_test)
print('Test set\n  Loss: {:0.3f}\n  Accuracy: {:0.3f}'.format(accr[0] ,accr[1]))

classifier.save("model.h67")


data1 = np.array(X_test)
List = [data1]
model = tf.keras.models.load_model("model.h67")
prediction = model.predict([(data1)])
target = (np.argmax(prediction, axis=0))
dataset1 = pd.DataFrame(target)
display(dataset1)

THE PROBLEM:

When I try to predict manually using my model I cant take right results. As prediction input a give a csv with only one column with random integers and I want bins that belong as a result. enter image description here

Upvotes: 1

Views: 231

Answers (2)

desertnaut
desertnaut

Reputation: 60388

There are several issues with your code.

To start with:

I trained it and evaluate it with accurancy 99,7% as a result.

This is a known issue (spurious high accuracy) when one erroneously uses binary_crossentropy loss for a multi-class classification problem; see:

Second, you are also erroneously using activation='sigmoid' in your last layer, where it should be activation='softmax'.

Third, get rid of all these activation='sigmoid' in the rest of your layers, and replace them with relu.

Last, you should get rid of all these kernel_initializer='random_normal' statements in your model layers; leave the argument undefined, so that it defaults to the (much superb) glorot_uniform (docs).

All in all, here is how your model should look like:

classifier = Sequential()
classifier.add(Dense(3557, activation='relu', input_dim=1))
classifier.add(Dense(3557, activation='relu'))
classifier.add(Dense(3557, activation='softmax'))

classifier.compile(optimizer ='adam',loss='categorical_crossentropy', metrics=['accuracy'])

That's very general advice, just for starters; a 3557-class problem is not trivial, neither is clear why you have chosen to go with 3 layers, all of them with the same number (3557) of nodes. Experiment with the architecture, keeping in mind the above points...

Upvotes: 1

Tinu
Tinu

Reputation: 2523

Do you get an error message or just wrong predictions? This is not clear from your question.

Try:

prediction = model.predict(data1)

Edit:

I have 3557 different bins (classes).

classifier.compile(optimizer ='adam',loss='binary_crossentropy', metrics=['accuracy'])

Then binary_crossentropy as loss function is not the right choice, try categorical_crossentropy.

Upvotes: 1

Related Questions