Reputation: 121
I've seen a couple questions that have a similar problem, but none of them solved mine. I'm trying to fit a neural network in Keras to a dataset with 22 input features for binary classification. The problem is that I only have 195 training samples. I know it's a small dataset, but I don't know if it's possible to fit a model with reasonable accuracy (I'm aiming for >95% accuracy). The problem I'm having is that my model is only outputting 1 and getting 75% accuracy because my dataset is 75% positive cases. Here's the code I have:
data = pd.read_csv("") #filename omitted, but it loads properly
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
Y = data['status']
X = data.drop(['status', 'name'], axis = 1)
xTrain, xTest, yTrain, yTest = train_test_split(X, Y, train_size = 0.8)
model = Sequential()
model.add(Dense(48, input_shape=(22,), activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation = 'softmax'))
optim = keras.optimizers.adam(lr=0.0001)
model.compile(optimizer = optim, loss = 'binary_crossentropy', metrics = ['accuracy'])
model.fit(xTrain, yTrain, epochs = 20, batch_size = 5, validation_data = (xTest, yTest))
I've tried adding more hidden layers, increasing the number of training epochs, and increased and lowered the optimizer's learning rate, but the accuracy stays the same. Here's the link to the dataset: https://www.dropbox.com/s/c4td650b4z7aizc/fixed.xlsx?dl=0
Upvotes: 0
Views: 168
Reputation: 1702
Some things that you need to try out to get a better accuracy:
Do not simply input the dataset as it is to the NN. Do some data prep like balancing the response class. Please take a look at various sampling techniques such as Undersampling, Oversampling, SMOTE etc. The accuracy can be improved well if your dataset has balanced class distribution.
Instead of activation = 'softmax'
, you should use the sigmoid
activation function.
Apart from thses,you should try several other architectures, lr values, no. of epochs, batch_size,optimizers etc.
Upvotes: 1