Reputation: 95
I have a tiny dataset of around 300 rows. Each row has: Column A: An image, Column B: Categorical text input, Column C: Categorical text input, Column D: Categorical text output
I am able to use a sequential Keras model on the image input data alone (Column A) to predict the output (Column D), but the accuracy is pretty abysmal (around 40%). How can I combine the image data with the categorical input data(s) to obtain better accuracy?
Following is the code I'm using. I get the error on model.fit: ValueError: could not convert string to float: 'item1'
There are no numbers in the data I am using, everything is categorical text. I think there is something I need to change in the models for the 'y' so it knows that the prediction is to be categorical and not numeric. I'm not sure what to change though.
drive.mount('/content/gdrive/')
train = pd.read_csv(r'gdrive/My Drive/Colab Notebooks/Fast AI/testfilled.csv')
df = pd.DataFrame(train)
df = df[['Column A', 'Column B', 'Column C', 'Column D']]
def process_categorical_attributes(df, train, test):
zipBinarizer = LabelBinarizer().fit(df["Column B"])
trainCategorical = zipBinarizer.transform(train["Column B"])
testCategorical = zipBinarizer.transform(test["Column B"])
zipBinarizer2 = LabelBinarizer().fit(df["Column C"])
trainCategorical2 = zipBinarizer.transform(train["Column C"])
testCategorical2 = zipBinarizer.transform(test["Column C"])
trainX = np.hstack([trainCategorical, trainCategorical2])
testX = np.hstack([testCategorical, testCategorical2])
return (trainX, testX)
def load_piece_images(df):
train_image = []
for i in tqdm(range(train.shape[0])):
img = image.load_img('gdrive/My Drive/Colab Notebooks/OutputDir/' + train['FileName'][i] + '.bmp',target_size=(400,400,3))
img = image.img_to_array(img)
img = img/255
train_image.append(img)
return np.array(train_image)
def create_mlp(dim, regress=False):
model = Sequential()
model.add(Dense(8, input_dim=dim, activation="relu"))
model.add(Dense(4, activation="relu"))
if regress:
model.add(Dense(1, activation="linear"))
return model
def create_cnn(width, height, depth, filters=(16, 32, 64), regress=False):
inputShape = (height, width, depth)
chanDim = -1
inputs = Input(shape=inputShape)
for (i, f) in enumerate(filters):
if i == 0:
x = inputs
x = Conv2D(f, (3, 3), padding="same")(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = MaxPooling2D(pool_size=(2, 2))(x)
x = Flatten()(x)
x = Dense(16)(x)
x = Activation("relu")(x)
x = BatchNormalization(axis=chanDim)(x)
x = Dropout(0.5)(x)
x = Dense(4)(x)
x = Activation("relu")(x)
if regress:
x = Dense(1, activation="linear")(x)
model = Model(inputs, x)
return model
images = load_piece_images(df)
split = train_test_split(df, images, test_size=0.25, random_state=42)
(trainAttrX, testAttrX, trainImagesX, testImagesX) = split
trainY = trainAttrX["Column D"]
testY = testAttrX["Column D"]
(trainAttrX, testAttrX) = process_categorical_attributes(df, trainAttrX, testAttrX)
mlp = create_mlp(trainAttrX.shape[1], regress=False)
cnn = create_cnn(400, 400, 3, regress=False)
combinedInput = concatenate([mlp.output, cnn.output])
x = Dense(4, activation="relu")(combinedInput)
x = Dense(1, activation="linear")(x)
x = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[mlp.input, cnn.input], outputs=x)
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss="mean_absolute_percentage_error", optimizer=opt)
model.fit(
[trainAttrX, trainImagesX], trainY,
validation_data=([testAttrX, testImagesX], testY),
epochs=20, batch_size=2)
Upvotes: 2
Views: 3384
Reputation: 19904
Another option that is sometimes used (e.g. in conditional GANs and AlphaFold 2) is to encode the categorical data as extra scalar feature channels in the input image. So, for example, if you have the categories 1-hot encoded you would take the vector that looks something like [0, 1, 0, ...] and extend your RGB channels with one channel full of 0's, another full of 1's, another full of 0's, etc.
The advantage over the other approach (concatenating the categorical feature encoding onto the neural embedding) is that the neural network itself sees more features, which in some cases is necessary when the RGB channels don't have enough information to discriminate between classes of interest. The disadvantage is that it is not as computationally or memory efficient.
Upvotes: 0
Reputation: 15053
This tutorial does a great thing at explaining how to use multiple input sources (text + image data): https://www.pyimagesearch.com/2019/02/04/keras-multiple-inputs-and-mixed-data/
Essentially this is exactly what you are looking for.
Upvotes: 2