OAK
OAK

Reputation: 3166

Keras - CNN - adjust dataset - remove biased class and data augmentation attempt

I have a a predicament with the model I am developing based on a popular dataset of skin-cancer images. I have to points I'd like some guidance on -

A.

The original dataset is over +10K images with which almost 7000 images belong to one of the seven classes. I've created a subset of 4948 random images with which I ran a function to convert the images into a list of lists - first list contains the image and the latter the class as well as dismiss any images that are of class (5 - the class with the +6800K images). Thought process was to normalise the distribution across the classes.

Re-running the original model with an output (Dense layer of 6 neurons instead of 7) - retrieves an error.

Am I missing a step to 'indicate' to the model that there are only six possible classes? The model runs only when the output layer has seven neurons.

error:

Train on 1245 samples, validate on 312 samples
Epoch 1/30
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-138-8a3b40a69e37> in <module>
     25              metrics=["accuracy"])
     26 
---> 27 model.fit(X_train, y_train, batch_size=32, epochs=30, validation_split=0.2)

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    778           validation_steps=validation_steps,
    779           validation_freq=validation_freq,
--> 780           steps_name='steps_per_epoch')
    781 
    782   def evaluate(self,

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_arrays.py in model_iteration(model, inputs, targets, sample_weights, batch_size, epochs, verbose, callbacks, val_inputs, val_targets, val_sample_weights, shuffle, initial_epoch, steps_per_epoch, validation_steps, validation_freq, mode, validation_in_fit, prepared_feed_values_from_dataset, steps_name, **kwargs)
    361 
    362         # Get outputs.
--> 363         batch_outs = f(ins_batch)
    364         if not isinstance(batch_outs, list):
    365           batch_outs = [batch_outs]

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6).  Label values: 1 1 2 4 2 1 2 1 2 1 2 2 4 2 2 1 3 1 4 6 0 2 4 2 0 4 2 4 4 0 2 4
     [[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

B.

I'm attempting to add Data Augmentation as the dataset is relatively small taking into account the number of classes and the sparsity of the images across the classes. Once I try to run the generator I receive the error message below which suggests that there is something wrong with one of the variables in the validation_data tuple. I cannot understand what the issue is.

Example values of the test set look like:

[[[[0.41568627]
   [0.4       ]
   [0.43137255]
   ...
   [0.54509804]
   [0.54901961]
   [0.54509804]]

  [[0.42352941]
   [0.43137255]
   [0.43921569]
   ...
   [0.56078431]
   [0.54117647]
   [0.55294118]]

  [[0.41960784]
   [0.41960784]
   [0.45490196]
   ...
   [0.51764706]
   [0.57254902]
   [0.50588235]]

  ...

  [[0.30980392]
   [0.36470588]
   [0.36470588]
   ...
   [0.47058824]
   [0.44705882]
   [0.41960784]]

  [[0.29803922]
   [0.31764706]
   [0.34509804]
   ...
   [0.45098039]
   [0.43921569]
   [0.4       ]]

  [[0.25882353]
   [0.30196078]
   [0.31764706]
   ...
   [0.45490196]
   [0.42745098]
   [0.36078431]]]


 [[[0.60784314]
   [0.59215686]
   [0.56862745]
   ...
   [0.59607843]
   [0.63921569]
   [0.63529412]]

  [[0.6627451 ]
   [0.63137255]
   [0.62352941]
   ...
   [0.67843137]
   [0.60784314]
   [0.63529412]]

  [[0.62745098]
   [0.65098039]
   [0.6       ]
   ...
   [0.61568627]
   [0.63921569]
   [0.67058824]]

  ...

  [[0.62352941]
   [0.6       ]
   [0.59607843]
   ...
   [0.6627451 ]
   [0.71372549]
   [0.6745098 ]]

  [[0.61568627]
   [0.58431373]
   [0.61568627]
   ...
   [0.67058824]
   [0.65882353]
   [0.68235294]]

  [[0.61176471]
   [0.60392157]
   [0.61960784]
   ...
   [0.65490196]
   [0.6627451 ]
   [0.66666667]]]]

[2, 1, 4, 4, 2]

Error:

Epoch 1/10
  1/155 [..............................] - ETA: 11s - loss: 1.7916 - acc: 0.3000
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-139-8f19a958861f> in <module>
     12 history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size)                           
     13                              ,epochs = 10, validation_data = (X_test, y_test),
---> 14                               steps_per_epoch= X_train.shape[0]// batch_size
     15                              )
     16 #epochs = epochs, validation_data = (X_test, y_test),

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1431         shuffle=shuffle,
   1432         initial_epoch=initial_epoch,
-> 1433         steps_name='steps_per_epoch')
   1434 
   1435   def evaluate_generator(self,

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training_generator.py in model_iteration(model, data, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch, mode, batch_size, steps_name, **kwargs)
    262 
    263       is_deferred = not model._is_compiled
--> 264       batch_outs = batch_function(*batch_data)
    265       if not isinstance(batch_outs, list):
    266         batch_outs = [batch_outs]

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight, reset_metrics)
   1173       self._update_sample_weight_modes(sample_weights=sample_weights)
   1174       self._make_train_function()
-> 1175       outputs = self.train_function(ins)  # pylint: disable=not-callable
   1176 
   1177     if reset_metrics:

/anaconda3/lib/python3.7/site-packages/tensorflow/python/keras/backend.py in __call__(self, inputs)
   3290 
   3291     fetched = self._callable_fn(*array_vals,
-> 3292                                 run_metadata=self.run_metadata)
   3293     self._call_fetch_callbacks(fetched[-len(self._fetches):])
   3294     output_structure = nest.pack_sequence_as(

/anaconda3/lib/python3.7/site-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: Received a label value of 6 which is outside the valid range of [0, 6).  Label values: 0 1 6 4 2 4 2 0 1 2
     [[{{node loss_15/activation_63_loss/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits}}]]

Code:

import tensorflow as tf
        from tensorflow.keras.models import Sequential
        from tensorflow.keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D

        import numpy as np
        import pandas as pd

        import matplotlib.pyplot as plt

        import sys
        import os
        import cv2

        DATA_DIR = "/Users/namefolder/PycharmProjects/skin-cancer/HAM10000_images_part_1"

        metadata = pd.read_csv(os.path.join(DATA_DIR, 'HAM10000_metadata.csv'))

        lesion_type_dict = {'nv': 'Melanocytic nevi',
            'mel': 'Melanoma',
            'bkl': 'Benign keratosis-like lesions ',
            'bcc': 'Basal cell carcinoma',
            'akiec': 'Actinic keratoses',
            'vasc': 'Vascular lesions',
            'df': 'Dermatofibroma'}

        metadata['cell_type'] = metadata['dx'].map(lesion_type_dict.get)
        metadata['dx_code'] = pd.Categorical(metadata['dx']).codes

        # save array of image-id and diagnosis-type (categorical)
        metadata = metadata[['image_id', 'dx', 'dx_type', 'dx_code']]

        training_data = []

        IMG_SIZE=50

        # preparing training data

        def creating_training_data(path):
            for img in os.listdir(path):
                try:
                    img_array = cv2.imread(os.path.join(path, img), cv2.IMREAD_GRAYSCALE)
                    new_array = cv2.resize(img_array, (IMG_SIZE, IMG_SIZE))
                    for index, row in metadata.iterrows():
                        if (img == row['image_id']+'.jpg') & (row['dx_code'] != 5):
                            try:
                                training_data.append([new_array, row['dx_code']])
                            except Exception as ee:
                                pass
                except Exception as e:
                    pass

            return training_data

        training_data = creating_training_data(DATA_DIR)

        import random

        random.shuffle(training_data)

        # Splitting data into X features and Y label
        X_train = []
        y_train = []
        for features, label in training_data:
            X_train.append(features)
            y_train.append(label)

        # Reshaping of the data - required by Tensorflow and Keras (*necessary step of deep-learning using these repos)
        X_train = np.array(X_train).reshape(-1, IMG_SIZE, IMG_SIZE, 1)

        # Normalize data - to reduce processing requirements
        X_train = X_train/255.0

        # model configuration
        model = Sequential()
        model.add(Conv2D(64, (3,3), input_shape = X_train.shape[1:]))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2)))

        model.add(Conv2D(64, (3,3)))
        model.add(Activation("relu"))
        model.add(MaxPooling2D(pool_size=(2, 2)))

        model.add(Flatten())
        model.add(Dense(64))

        model.add(Dense(6))
        model.add(Activation("softmax"))

        model.compile(loss="mean_squared_error",
                     optimizer="adam",
                     metrics=["accuracy"])

    # Data Augmentation - Repo enabler
    from keras.preprocessing.image import ImageDataGenerator
    from keras.callbacks import ReduceLROnPlateau

    # initialize the training training data augmentation object
    trainAug = ImageDataGenerator(
        rescale=1 / 255.0,
        rotation_range=20,
        zoom_range=0.05,
        width_shift_range=0.05,
        height_shift_range=0.05,
        shear_range=0.05,
        horizontal_flip=True,
        fill_mode="nearest")

    # initialize the validation (and testing) data augmentation object
    valAug = ImageDataGenerator(rescale=1 / 255.0)

    #set a leraning rate annealer
    learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc',
                                               patience=3,
                                               verbose=1,
                                               factor=0.5,
                                               min_lr=0.00001)

    #Augmented Images model development
   )

    trainAug.fit(X_train)

    #Fit the model
    epochs = 10
    batch_size= 10

    history = model.fit_generator(trainAug.flow(X_train, y_train, batch_size=batch_size),epochs = 10, validation_data = (X_test, y_test), steps_per_epoch= X_train.shape[0]// batch_size)

Upvotes: 0

Views: 907

Answers (1)

Daniel M&#246;ller
Daniel M&#246;ller

Reputation: 86630

Initially you had 7 labels: your code was then expecting labels 0, 1, 2, 3, 4, 5, 6

You removed label 5 from the dataset, ok. Now you have 6 labels in total.
Your code is expecting: 0, 1, 2, 3, 4, 5

But what you have in your data is: 0, 1, 2, 3, 4, 6

After removing the label 5, you need to transform the labels 6 into 5.


Something in the lines of:

if (img == row['image_id']+'.jpg') & (row['dx_code'] > 5):
    try:
        training_data.append([new_array, row['dx_code'] - 1])
    except Exception as ee:
        pass
elif (img == row['image_id']+'.jpg') & (row['dx_code'] < 5):
    try:
        training_data.append([new_array, row['dx_code']])
    except Exception as ee:
        pass

Upvotes: 2

Related Questions