Text classification using Keras: How to add custom features?

Question

I'm writing a program to classify texts into a few classes. Right now, the program loads the train and test samples of word indices, applies an embedding layer and a convolutional layer, and classifies them into the classes. I'm trying to add handcrafted features for experimentation, as in the following code. The features is a list of two elements, where the first element consists of features for the training data, and the second consists of features for the test data. Each training/test sample will have a corresponding feature vector (i.e. the features are not word features).

model = Sequential()
model.add(Embedding(params.nb_words,
                    params.embedding_dims,
                    weights=[embedding_matrix],
                    input_length=params.maxlen,
                    trainable=params.trainable))
model.add(Convolution1D(nb_filter=params.nb_filter,
                        filter_length=params.filter_length,
                        border_mode='valid',
                        activation='relu'))
model.add(Dropout(params.dropout_rate))
model.add(GlobalMaxPooling1D())

# Adding hand-picked features
model_features = Sequential()
nb_features = len(features[0][0])

model_features.add(Dense(1,
                         input_shape=(nb_features,),
                         init='uniform',
                         activation='relu'))

model_final = Sequential()
model_final.add(Merge([model, model_features], mode='concat'))

model_final.add(Dense(len(citfunc.funcs), activation='softmax'))
model_final.compile(loss='categorical_crossentropy',
                    optimizer='adam',
                    metrics=['accuracy'])

print model_final.summary()
model_final.fit([x_train, features[0]], y_train,
                nb_epoch=params.nb_epoch,
                batch_size=params.batch_size,
                class_weight=data.get_class_weights(x_train, y_train))

y_pred = model_final.predict([x_test, features[1]])

My question is, is this code correct? Is there any conventional way of adding features to each of the text sequences?

Marcin Możejko · Accepted Answer

Try:

input = Input(shape=(params.maxlen,))
embedding = Embedding(params.nb_words,
                    params.embedding_dims,
                    weights=[embedding_matrix],
                    input_length=params.maxlen,
                    trainable=params.trainable)(input)
conv = Convolution1D(nb_filter=params.nb_filter,
                        filter_length=params.filter_length,
                        border_mode='valid',
                        activation='relu')(embedding)
drop = Dropout(params.dropout_rate)(conv)
seq_features = GlobalMaxPooling1D()(drop)

# Adding hand-picked features
nb_features = len(features[0][0])
other_features = Input(shape=(nb_features,))

model_final = merge([seq_features , other_features], mode='concat'))

model_final = Dense(len(citfunc.funcs), activation='softmax'))(model_final)

model_final = Model([input, other_features], model_final)

model_final.compile(loss='categorical_crossentropy',
                    optimizer='adam',
                    metrics=['accuracy'])

In this case - you are merging features from a sequence analysis with custom features directly - without squashing all custom features to 1 features using Dense.

Text classification using Keras: How to add custom features?

Answers (1)

Related Questions