DDD
DDD

Reputation: 121

Keras predict() returns a better accuracy than evaluate()

I set up a model with Keras, then I trained it on a dataset of 3 records and finally I tested the resulting model with evaluate() and predict(), using the same test set for both functions (the test set has 100 records and it doesn't have any record of the training set, as much as it can be relevant, given the size of the two datasets). The dataset is composed by 5 files, where 4 files represent each one a different temperature sensor, that each minute collects 60 measurements (each row contains 60 measurements), while the last file contains the class labels that I want to predict (in particular, 3 classes: 3, 20 or 100).

This is the model I'm using:

n_sensors, t_periods = 4, 60

model = Sequential()

model.add(Conv1D(100, 6, activation='relu', input_shape=(t_periods, n_sensors)))

model.add(Conv1D(100, 6, activation='relu'))

model.add(MaxPooling1D(3))

model.add(Conv1D(160, 6, activation='relu'))

model.add(Conv1D(160, 6, activation='relu'))

model.add(GlobalAveragePooling1D())

model.add(Dropout(0.5))

model.add(Dense(3, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

That I train: self.model.fit(X_train, y_train, batch_size=3, epochs=5, verbose=1)

Then I use evaluate: self.model.evaluate(x_test, y_test, verbose=1)

And predict:

predictions = self.model.predict(data)
result = np.where(predictions[0] == np.amax(predictions[0]))
if result[0][0] == 0:
    return '3'
elif result[0][0] == 1:
    return '20'
else:
    return '100'

For each class predicted, I confront it with the actual label, and then I calculate correct guesses / total examples, that should be equivalent to accuracy from the evaluate() function. Here's the code:

correct = 0
for profile in self.profile_file: #profile_file is an opened file
    ts1 = self.ts1_file.readline()
    ts2 = self.ts2_file.readline()
    ts3 = self.ts3_file.readline()
    ts4 = self.ts4_file.readline()
    data = ts1, ts2, ts3, ts4
    test_data = self.dl.transform(data) # see the last block of code I posted
    prediction = self.model.predict(test_data)
    if prediction == label:
       correct += 1
acc = correct / 100 # 100 is the number of total examples

Data feeded to evaluate() is taken from this function:

label = pd.read_csv(os.path.join(self.testDir, 'profile.txt'), sep='\t', header=None)
label = np_utils.to_categorical(label[0].factorize()[0])
data = [os.path.join(self.testDir,'TS2.txt'),os.path.join(self.testDir, 'TS1.txt'),os.path.join(self.testDir,'TS3.txt'),os.path.join(self.testDir, 'TS4.txt')]
df = pd.DataFrame()
for txt in data:
    read_df = pd.read_csv(txt, sep='\t', header=None)
    df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df, label

While data feeded to predict() is taken from this other one:

df = pd.DataFrame()
for txt in data: # data 
    read_df = pd.read_csv(StringIO(txt), sep='\t', header=None)
    df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df

Accuracies yielded by evaluate() and predict() are always different: in particular, the maximum difference I noted was when evaluate() resulted in a 78% accuracy while predict() in a 95% accuracy. The only difference between the two functions is that I make predict() work on an example at a time, while evaluate() takes the entire dataset all at once, but it should result in no difference. How can it be?

UPDATE 1: It seems that the problem is in how I prepare my data. In the case of predict(), I transform only one line at a time from each file using the last block of code I posted, while in feeding evaluate(), I transform the entire files using the other function reported. Why should it be different? It seems to me that I'm applying the exact same transformation, the only difference is in the number of rows transformed.

Upvotes: 8

Views: 1689

Answers (1)

sparkles
sparkles

Reputation: 174

This question was already answered here

what happens is when you evaluate the model, since your loss function is categorical_crossentropy, metrics=['accuracy'] calculates categorical_accuracy.

But predict has a default set to binary_accuracy.

So essentially you are calculating categorical accuracy with evaluate and and binary accuracy with predict. this is the reason they are so widely different.

the difference between categorical_accuracy and binary_accuracy is that categorical_accuracy check if all the outputs match with your y_test and binary_accuracy checks if each of you outputs matches with your y_test.

Example(single row):

prediction = [0,0,1,1,0]
y_test = [0,0,0,1,0]

categorical_accuracy = 0% 

since 1 output does not match the categorical_accuracy is 0

binary_accuracy = 80% 

even though 1 output doesn't match the rest of 80% do match so accuracy is 80%

Upvotes: 5

Related Questions