Classification using categorical input data and image input data

I have a tiny dataset of around 300 rows. Each row has: Column A: An image, Column B: Categorical text input, Column C: Categorical text input, Column D: Categorical text output

I am able to use a sequential Keras model on the image input data alone (Column A) to predict the output (Column D), but the accuracy is pretty abysmal (around 40%). How can I combine the image data with the categorical input data(s) to obtain better accuracy?

Following is the code I'm using. I get the error on model.fit: ValueError: could not convert string to float: 'item1'

There are no numbers in the data I am using, everything is categorical text. I think there is something I need to change in the models for the 'y' so it knows that the prediction is to be categorical and not numeric. I'm not sure what to change though.

drive.mount('/content/gdrive/')
train = pd.read_csv(r'gdrive/My Drive/Colab Notebooks/Fast AI/testfilled.csv')
df = pd.DataFrame(train)
df = df[['Column A', 'Column B', 'Column C', 'Column D']]

def process_categorical_attributes(df, train, test):
  zipBinarizer = LabelBinarizer().fit(df["Column B"])
  trainCategorical = zipBinarizer.transform(train["Column B"])
  testCategorical = zipBinarizer.transform(test["Column B"])

  zipBinarizer2 = LabelBinarizer().fit(df["Column C"])
  trainCategorical2 = zipBinarizer.transform(train["Column C"])
  testCategorical2 = zipBinarizer.transform(test["Column C"])

  trainX = np.hstack([trainCategorical, trainCategorical2])
  testX = np.hstack([testCategorical, testCategorical2])
  return (trainX, testX)

def load_piece_images(df):
  train_image = []
  for i in tqdm(range(train.shape[0])):
    img = image.load_img('gdrive/My Drive/Colab Notebooks/OutputDir/' + train['FileName'][i] + '.bmp',target_size=(400,400,3))
    img = image.img_to_array(img)
    img = img/255   
    train_image.append(img)
  return np.array(train_image)

def create_mlp(dim, regress=False):  
  model = Sequential()
  model.add(Dense(8, input_dim=dim, activation="relu"))
  model.add(Dense(4, activation="relu"))
  if regress:
    model.add(Dense(1, activation="linear"))
  return model

def create_cnn(width, height, depth, filters=(16, 32, 64), regress=False):
    inputShape = (height, width, depth)
    chanDim = -1
    inputs = Input(shape=inputShape)
    for (i, f) in enumerate(filters):
        if i == 0:
            x = inputs
        x = Conv2D(f, (3, 3), padding="same")(x)
        x = Activation("relu")(x)
        x = BatchNormalization(axis=chanDim)(x)
        x = MaxPooling2D(pool_size=(2, 2))(x)
    x = Flatten()(x)
    x = Dense(16)(x)
    x = Activation("relu")(x)
    x = BatchNormalization(axis=chanDim)(x)
    x = Dropout(0.5)(x)
    x = Dense(4)(x)
    x = Activation("relu")(x)
    if regress:
        x = Dense(1, activation="linear")(x)
    model = Model(inputs, x)
    return model

images = load_piece_images(df)
split = train_test_split(df, images, test_size=0.25, random_state=42)
(trainAttrX, testAttrX, trainImagesX, testImagesX) = split

trainY = trainAttrX["Column D"]
testY = testAttrX["Column D"]
(trainAttrX, testAttrX) = process_categorical_attributes(df, trainAttrX, testAttrX)

mlp = create_mlp(trainAttrX.shape[1], regress=False)
cnn = create_cnn(400, 400, 3, regress=False)
combinedInput = concatenate([mlp.output, cnn.output])
x = Dense(4, activation="relu")(combinedInput)
x = Dense(1, activation="linear")(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)

opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)
model.fit(
    [trainAttrX, trainImagesX], trainY,
    validation_data=([testAttrX, testImagesX], testY),
    epochs=20, batch_size=2)

Upvotes: 2

Answers (2)

crypdick

Reputation: 19904

Another option that is sometimes used (e.g. in conditional GANs and AlphaFold 2) is to encode the categorical data as extra scalar feature channels in the input image. So, for example, if you have the categories 1-hot encoded you would take the vector that looks something like [0, 1, 0, ...] and extend your RGB channels with one channel full of 0's, another full of 1's, another full of 0's, etc.

The advantage over the other approach (concatenating the categorical feature encoding onto the neural embedding) is that the neural network itself sees more features, which in some cases is necessary when the RGB channels don't have enough information to discriminate between classes of interest. The disadvantage is that it is not as computationally or memory efficient.

Upvotes: 0

Timbus Calin

Reputation: 15053

This tutorial does a great thing at explaining how to use multiple input sources (text + image data): https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/

Essentially this is exactly what you are looking for.

Upvotes: 2

Classification using categorical input data and image input data

Answers (2)

Related Questions