Reputation: 139
I am new to Deep Learning and Keras. I have created a model that trains on the ASL(American Sign Language) dataset with nearly 80,000 training images and 1500 testing images. I have also appended some more classes ie. Hand sign numbers from 0-9. So, in total, I have 39 classes (0-9 and A-Z). My task is to training this dataset and use it for prediction. My input for prediction would be a frame from a webcam where I'll be displaying the hand sign.
classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), input_shape = (100, 100, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Conv2D(32, (3, 3), activation = 'relu'))
classifier.add(MaxPooling2D(pool_size = (2, 2)))
classifier.add(Flatten())
classifier.add(Dense(units = 128, activation = 'relu'))
classifier.add(Dense(units = 39, activation = 'softmax'))
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True)
test_datagen = ImageDataGenerator(rescale = 1./255)
training_set = train_datagen.flow_from_directory('train',
target_size = (100,100),
batch_size = 128,
class_mode = 'categorical')
test_set = test_datagen.flow_from_directory('test',
target_size = (100, 100),
batch_size = 128,
class_mode = 'categorical')
classifier.fit_generator(training_set,
steps_per_epoch = 88534,
epochs = 10,
validation_data = test_set,
validation_steps = 1418)
The ASL dataset images are of size 200x200 and the number sign datasets are of size 64x64. After running for 5 epocs with validation accuracy 96% I am still not able to get good predictions when I run it on a video.
classifier = load_model('asl_original.h5')
classifier.compile(loss='binary_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
cam = cv2.VideoCapture(0)
while(1):
try:
ret, frame = cam.read()
frame = cv2.flip(frame,1)
roi = frame[100:400,200:500]
cv2.rectangle(frame,(200,100),(500,400),(0,255,0),2)
cv2.imshow('frame',frame)
cv2.imshow('roi',roi)
img = cv2.resize(roi,(100,100))
img = np.reshape(img,[1,100,100,3])
classes = classifier.predict_classes(img)
print(classes)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
except Exception:
traceback.print_exc()
pass
I Don't understand why am I not able to get accurate predictions even after training on such a large dataset. What changes do I need to make so that I get accurate predictions for all my 39 classes.
Link for the datasets. ASL DATASET and Hand sign for numbers
Upvotes: 0
Views: 321
Reputation: 111
In the classifier.compile you use the loss='binary_crossentropy' that is used only where the labels are binary (only two classes). When you have multiclass classification you must use the appropriate loss function based on the numbers and types of your labels (i.e. 'sparse_categorical_crossentropy').
Try to read this useful blog post that explains every loss function in details.
Upvotes: 2