Reputation: 125
I want to build a multi-class classification model using Keras. My data is containing 7 features and 4 labels. If I am using Keras I have seen two ways to apply the Support vector Machine (SVM) algorithm.
First: A Quasi-SVM in Keras By using the (RandomFourierFeatures layer) presented here I have built the following model:
def create_keras_model():
initializer = tf.keras.initializers.GlorotNormal()
return tf.keras.models.Sequential([
layers.Input(shape=(7,)),
RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
layers.Dense(units=4, activation='softmax'),
])
Second: Using the last layer in the network as described here as follows:
def create_keras_model():
return tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(7,)),
tf.keras.layers.Dense(64),
tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
tf.keras.layers.Softmax()
])
note: CategoricalHinge()
was used as the loss function.
My question is: are these approaches appropriate and can be defined as applying of SVM model or it is just an approximation of the model architecture? in short, can I say this is applying of SVM model?
Upvotes: 4
Views: 1519
Reputation: 24059
You can check two models on your data like below:
I check on mnist
dataset and get the below result:
from keras.utils.layer_utils import count_params
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time
def create_model(approach):
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(784,)))
if approach == 'Quasi_SVM':
model.add(tf.keras.layers.experimental.RandomFourierFeatures(
output_dim=4096, scale=10.0,
kernel_initializer="gaussian"))
model.add(tf.keras.layers.Dense(10))
if approach == 'kernel_regularizer':
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(16, activation='relu'))
model.add(tf.keras.layers.Dense(10,
kernel_regularizer = tf.keras.regularizers.l2(0.01),
activation='softmax'))
model.compile(
optimizer = 'adam',
loss = 'hinge',
metrics=['accuracy'],
)
return model
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255
y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)
for approach in ['Quasi_SVM', 'kernel_regularizer']:
model = create_model(approach)
start = time.time()
history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
print(f'Training time {approach} : {time.time() - start} sec')
print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')
df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
fig, axes = plt.subplots(1,2, figsize=(18,6))
for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
ax.set_title(f'{approach} {mtr.title()} Plot')
dfTmp = df[df['variable'].str.contains(mtr)]
sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)
fig.tight_layout()
plt.show()
Output: (benchmark on colab)
Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956
Upvotes: 1
Reputation: 74
I think it's just an approximation of the SVM model, because the pure definition of SVM stand on this theorem that, we have to compute the support vector with the Primal-Dual Optimization approach and use this support vector for draw maximum-margin hyperplane. but in the neural network and the framework like Keras(in general tensorflow) mostly use the gradient descent optimization approach to find optimal parameter. Besides, I think the number of parameters, which we have to optimize in pure SVM is different with the neural network, like which you have been wrote in the question.
Upvotes: 1