torkestativ
torkestativ

Reputation: 382

Get the certainty of the OneClassSVM classification results

I have performed anomaly detection using OneClassSVM with sklearn on unlabeled data. In order to inspect the results I want to filter on the certainty of the classifications. I have come across predict_proba, but I am unable to use it on OneClassSVM as AttributeError: 'OneClassSVM' object has no attribute 'predict_proba'. I am not sure if predict_proba is the correct way to go, I have just come across it in my search for solving this problem.

Here is a snippet of the data, where CompanyID is the ID of a mall and 1 and 2 are sensors on two separate entrances of the mall:

import pandas as pd
df = pd.DataFrame({"Datetime": [2016-6-13,2016-6-14,2016-6-15,2016-6-16],
                  "CompanyID": [271, 271, 271, 271],
                  "1": [140, 143, 142, 143],
                  "2": [42, 43, 49, 230]})

The OneClassSVM model. But I am unsure how to get the certainty of the classifications.

#support vector machines outlier detection
from sklearn import preprocessing, svm
import matplotlib.pyplot as plt

def find_outliers(ts, perc=0.02, figsize=(15,5)):
    ## fit svm
    scaler = preprocessing.StandardScaler()
    ts_scaled = scaler.fit_transform(ts.values.reshape(-1,1))
    model = svm.OneClassSVM(nu=perc, kernel="rbf", gamma=0.03)
    model.fit(ts_scaled)
    ## dtf output
    df_outliers = ts.to_frame(name="ts")
    df_outliers["index"] = ts.index
    df_outliers["outlier"] = model.predict(ts_scaled)
    df_outliers["outlier"] = df_outliers["outlier"].apply(lambda
                                              x: 1 if x==-1 else 0)
    ##CERTAINTY OF THE CLASSIFICATION
    ##this line of code returns an error. 
    df_outliers["probability"] = model.predict_proba(df_outliers)
    
    ## plot
    fig, ax = plt.subplots(figsize=figsize)
    plt.title(f'SVM - Entrance: {column}. Antall outliers: '+str(sum(df_outliers["outlier"]==1)))

    ax.plot(df_outliers["index"], df_outliers["ts"],
            color="black")
    ax.scatter(x=df_outliers[df_outliers["outlier"]==1]["index"],
               y=df_outliers[df_outliers["outlier"]==1]['ts'],
               color='red')
    ax.grid(True)
    plt.show()

    # Return outlier column here
    return(df_outliers['outlier'])
#loop over the entrances of the mall
for column in df.columns[2:]:
    find_outliers(df[column])

Edit: As @Zoro have pointed out, predict_proba is not available for OneClassSVM. How can I go about solving this?

Upvotes: 0

Views: 375

Answers (1)

Zoro
Zoro

Reputation: 423

I just skimmed through the documentation for sklearn.svm.OneClassSVM. It doesn't have any predict_proba method defined. You could use Tree based classifiers like DecisionTrees or RandomForestClassifier.

Upvotes: 1

Related Questions