Alwani
Alwani

Reputation: 125

Different approaches for applying SVM in Keras

I want to build a multi-class classification model using Keras. My data is containing 7 features and 4 labels. If I am using Keras I have seen two ways to apply the Support vector Machine (SVM) algorithm.

First: A Quasi-SVM in Keras By using the (RandomFourierFeatures layer) presented here I have built the following model:

def create_keras_model():
  initializer = tf.keras.initializers.GlorotNormal()
  return tf.keras.models.Sequential([
                            layers.Input(shape=(7,)),
                            RandomFourierFeatures(output_dim=4822, kernel_initializer=initializer),
                            layers.Dense(units=4, activation='softmax'),
                            ])

Second: Using the last layer in the network as described here as follows:

def create_keras_model():
  return tf.keras.models.Sequential([
            tf.keras.layers.Input(shape=(7,)),
            tf.keras.layers.Dense(64),
            tf.keras.layers.Dense(4, kernel_regularizer=l2(0.01)),
            tf.keras.layers.Softmax()
                        
  ])

note: CategoricalHinge() was used as the loss function. My question is: are these approaches appropriate and can be defined as applying of SVM model or it is just an approximation of the model architecture? in short, can I say this is applying of SVM model?

Upvotes: 4

Views: 1519

Answers (2)

I'mahdi
I'mahdi

Reputation: 24059

You can check two models on your data like below:

I check on mnist dataset and get the below result:

  1. Less overfitting with the second approach
  2. Fast training time with the first approach
  3. Less trainable params with the first approach
  4. Accuracy for two approaches same as each other
from keras.utils.layer_utils import count_params  
import matplotlib.pyplot as plt
import tensorflow as tf
import seaborn as sns
import pandas as pd
import time


def create_model(approach):

    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=(784,)))
    if  approach == 'Quasi_SVM':
        model.add(tf.keras.layers.experimental.RandomFourierFeatures(
            output_dim=4096, scale=10.0, 
            kernel_initializer="gaussian"))
        model.add(tf.keras.layers.Dense(10))


    if approach == 'kernel_regularizer':
        model.add(tf.keras.layers.Dense(128, activation='relu'))
        model.add(tf.keras.layers.Dense(64, activation='relu'))
        model.add(tf.keras.layers.Dense(32, activation='relu'))
        model.add(tf.keras.layers.Dense(16, activation='relu'))
        model.add(tf.keras.layers.Dense(10, 
                                        kernel_regularizer = tf.keras.regularizers.l2(0.01), 
                                        activation='softmax')) 
    

    model.compile(
        optimizer = 'adam',
        loss = 'hinge',
        metrics=['accuracy'],
    )

    return model


(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

x_train = x_train.reshape(-1, 784).astype("float32") / 255
x_test = x_test.reshape(-1, 784).astype("float32") / 255

y_train = tf.keras.utils.to_categorical(y_train)
y_test = tf.keras.utils.to_categorical(y_test)

for approach in ['Quasi_SVM', 'kernel_regularizer']:

    model = create_model(approach)
    start = time.time()
    history = model.fit(x_train, y_train, epochs=30, batch_size=128, validation_split=0.2)
    print(f'Training time {approach} : {time.time() - start} sec')
    print(f'Trainable params {approach} : {count_params(model.trainable_weights)}')
    print(f'Accuracy on x_test {approach} : {model.evaluate(x_test, y_test, verbose=0)[1]}')

    
    df = pd.DataFrame(history.history).rename_axis('epoch').reset_index().melt(id_vars=['epoch'])
    fig, axes = plt.subplots(1,2, figsize=(18,6))
    for ax, mtr in zip(axes.flat, ['loss', 'accuracy']):
        ax.set_title(f'{approach} {mtr.title()} Plot')
        dfTmp = df[df['variable'].str.contains(mtr)]
        sns.lineplot(data=dfTmp, x='epoch', y='value', hue='variable', ax=ax)

    fig.tight_layout()
    plt.show()

Output: (benchmark on colab)

Training time Quasi_SVM : 43.78484082221985 sec
Trainable params Quasi_SVM : 40970
Accuracy on x_test Quasi_SVM : 0.9729999899864197
Training time kernel_regularizer : 45.47012114524841 sec
Trainable params kernel_regularizer : 111514
Accuracy on x_test kernel_regularizer : 0.972100019454956

enter image description here

enter image description here

Upvotes: 1

Hossein Biniazian
Hossein Biniazian

Reputation: 74

I think it's just an approximation of the SVM model, because the pure definition of SVM stand on this theorem that, we have to compute the support vector with the Primal-Dual Optimization approach and use this support vector for draw maximum-margin hyperplane. but in the neural network and the framework like Keras(in general tensorflow) mostly use the gradient descent optimization approach to find optimal parameter. Besides, I think the number of parameters, which we have to optimize in pure SVM is different with the neural network, like which you have been wrote in the question.

Upvotes: 1

Related Questions