jim jarnac
jim jarnac

Reputation: 5152

keras :model.predict returns array with the wrong shape

I am having issue with the shape of the output of model.predict.

Lets condiser the following simple dummy model:

import numpy as np
import pandas as pd
import tensorflow as tf
import string
from tensorflow.keras import layers

###Creating pseudo data
a = pd.DataFrame(np.random.randn(1000, 10), columns = [i for i in string.ascii_lowercase[:10]])

### Splitting data between train and test set
train = a.sample(frac=0.75, random_state=0)
test = a.drop(train.index)

### Creating features columns
f = []
for c in a.columns:
    if c == 'j':
        continue
    x = tf.feature_column.numeric_column(c)
    f.append(x)

l = layers.DenseFeatures(f)

### Creating model
model = tf.keras.models.Sequential()
model.add(l)  
model.add(tf.keras.layers.Dense(
    units = 64,
    activation = 'relu',
    name = 'hidden1'
    )
)
model.add(tf.keras.layers.Dense(
    units = 128,
    activation = 'relu',
    name = 'hidden2'
    )
)


### Prepping training data for input in the model
features = {name : np.array(value) for name, value in train.items()}
label = np.array(features.pop('j'))

lr = 0.001
batch_size = 40
epochs = 40

model.compile(optimizer = tf.keras.optimizers.RMSprop(lr = lr),
    loss = 'mean_squared_error',
    metrics = [tf.keras.metrics.MeanSquaredError()])

model.fit(x=features, y=label, batch_size = batch_size, epochs = epochs, shuffle = True)

features_predict = {name : np.array(value) for name, value in test.items()}
label_predict = np.array(features_predict.pop('j'))

p = model.predict(x=features_predict)

now if I inspect p:

p.shape

it returns (250, 128) (where obviously i would expect to predict the j value and be of shape (250,1)

What am I doing wrong ?

Upvotes: 0

Views: 1125

Answers (1)

Alexandr Dibrov
Alexandr Dibrov

Reputation: 156

The last layer of your model is a Dense layer with 128 units. It is supposed to return a tensor with dimensions [batch_size, 128]. So everything is as expected. You should change the number of units to 1 if you want to have an output with a single value per sample instead.

Upvotes: 1

Related Questions