SVM - scoring function

Question

I have the following output from weka for SVM classification. I wanted to plot the SVM classifier output in to anomaly or normal. How is it possible to get the SVM scoring function out of this output?

=== Run information ===

Scheme:       weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Relation:     KDDTrain
Instances:    125973
Attributes:   42
              duration
              protocol_type
              service
              flag
              src_bytes
              dst_bytes
              land
              wrong_fragment
              urgent
              hot
              num_failed_logins
              logged_in
              num_compromised
              root_shell
              su_attempted
              num_root
              num_file_creations
              num_shells
              num_access_files
              num_outbound_cmds
              is_host_login
              is_guest_login
              count
              srv_count
              serror_rate
              srv_serror_rate
              rerror_rate
              srv_rerror_rate
              same_srv_rate
              diff_srv_rate
              srv_diff_host_rate
              dst_host_count
              dst_host_srv_count
              dst_host_same_srv_rate
              dst_host_diff_srv_rate
              dst_host_same_src_port_rate
              dst_host_srv_diff_host_rate
              dst_host_serror_rate
              dst_host_srv_serror_rate
              dst_host_rerror_rate
              dst_host_srv_rerror_rate
              class
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

SMO

Kernel used:
  Linear Kernel: K(x,y) = 

Classifier for classes: normal, anomaly

BinarySMO

Machine linear: showing attribute weights, not support vectors.

        -0.0498 * (normalized) duration
 +       0.5131 * (normalized) protocol_type=tcp
 +      -0.6236 * (normalized) protocol_type=udp
 +       0.1105 * (normalized) protocol_type=icmp
 +      -1.1861 * (normalized) service=auth
 +       0      * (normalized) service=bgp
 +       0      * (normalized) service=courier
 +       1      * (normalized) service=csnet_ns
 +       1      * (normalized) service=ctf
 +       1      * (normalized) service=daytime
 +      -0      * (normalized) service=discard
 +      -1.2505 * (normalized) service=domain
 +      -0.6878 * (normalized) service=domain_u
 +       0.9418 * (normalized) service=echo
 +       1.1964 * (normalized) service=eco_i
 +       0.9767 * (normalized) service=ecr_i
 +       0.0073 * (normalized) service=efs
 +       0.0595 * (normalized) service=exec
 +      -1.4426 * (normalized) service=finger
 +      -1.047  * (normalized) service=ftp
 +      -1.4225 * (normalized) service=ftp_data
 +       2      * (normalized) service=gopher
 +       1      * (normalized) service=hostnames
 +      -0.9961 * (normalized) service=http
 +       0.7255 * (normalized) service=http_443
 +       0.5128 * (normalized) service=imap4
 +      -6.3664 * (normalized) service=IRC
 +       1      * (normalized) service=iso_tsap
 +      -0      * (normalized) service=klogin
 +       0      * (normalized) service=kshell
 +       0.7422 * (normalized) service=ldap
 +       1      * (normalized) service=link
 +       0.5993 * (normalized) service=login
 +       1      * (normalized) service=mtp
 +       1      * (normalized) service=name
 +       0.2322 * (normalized) service=netbios_dgm
 +       0.213  * (normalized) service=netbios_ns
 +       0.1902 * (normalized) service=netbios_ssn
 +       1.1472 * (normalized) service=netstat
 +       0.0504 * (normalized) service=nnsp
 +       1.058  * (normalized) service=nntp
 +      -1      * (normalized) service=ntp_u
 +      -1.5344 * (normalized) service=other
 +       1.3595 * (normalized) service=pm_dump
 +       0.8355 * (normalized) service=pop_2
 +      -2      * (normalized) service=pop_3
 +       0      * (normalized) service=printer
 +       1.051  * (normalized) service=private
 +      -0.3082 * (normalized) service=red_i
 +       1.0034 * (normalized) service=remote_job
 +       1.0112 * (normalized) service=rje
 +      -1.0454 * (normalized) service=shell
 +      -1.6948 * (normalized) service=smtp
 +       0.1388 * (normalized) service=sql_net
 +      -0.3438 * (normalized) service=ssh
 +       1      * (normalized) service=supdup
 +       0.8756 * (normalized) service=systat
 +      -1.6856 * (normalized) service=telnet
 +      -0      * (normalized) service=tim_i
 +      -0.8579 * (normalized) service=time
 +      -0.726  * (normalized) service=urh_i
 +      -1.0285 * (normalized) service=urp_i
 +       1.0347 * (normalized) service=uucp
 +       0      * (normalized) service=uucp_path
 +       0      * (normalized) service=vmnet
 +       1      * (normalized) service=whois
 +      -1.3388 * (normalized) service=X11
 +       0      * (normalized) service=Z39_50
 +       1.7882 * (normalized) flag=OTH
 +      -3.0982 * (normalized) flag=REJ
 +      -1.7279 * (normalized) flag=RSTO
 +       1      * (normalized) flag=RSTOS0
 +       2.4264 * (normalized) flag=RSTR
 +       1.5906 * (normalized) flag=S0
 +      -1.952  * (normalized) flag=S1
 +      -0.9628 * (normalized) flag=S2
 +      -0.3455 * (normalized) flag=S3
 +       1.2757 * (normalized) flag=SF
 +       0.0054 * (normalized) flag=SH
 +       0.8742 * (normalized) src_bytes
 +       0.0542 * (normalized) dst_bytes
 +      -1.2659 * (normalized) land=1
 +       2.7922 * (normalized) wrong_fragment
 +       0.0662 * (normalized) urgent
 +       8.1153 * (normalized) hot
 +       2.4822 * (normalized) num_failed_logins
 +       0.2242 * (normalized) logged_in=1
 +      -0.0544 * (normalized) num_compromised
 +       0.9248 * (normalized) root_shell
 +      -2.363  * (normalized) su_attempted
 +      -0.2024 * (normalized) num_root
 +      -1.2791 * (normalized) num_file_creations
 +      -0.0314 * (normalized) num_shells
 +      -1.4125 * (normalized) num_access_files
 +      -0.0154 * (normalized) is_host_login=1
 +      -2.3307 * (normalized) is_guest_login=1
 +       4.3191 * (normalized) count
 +      -2.7484 * (normalized) srv_count
 +      -0.6276 * (normalized) serror_rate
 +       2.843  * (normalized) srv_serror_rate
 +       0.6105 * (normalized) rerror_rate
 +       3.1388 * (normalized) srv_rerror_rate
 +      -0.1262 * (normalized) same_srv_rate
 +      -0.1825 * (normalized) diff_srv_rate
 +       0.2961 * (normalized) srv_diff_host_rate
 +       0.7812 * (normalized) dst_host_count
 +      -1.0053 * (normalized) dst_host_srv_count
 +       0.0284 * (normalized) dst_host_same_srv_rate
 +       0.4419 * (normalized) dst_host_diff_srv_rate
 +       1.384  * (normalized) dst_host_same_src_port_rate
 +       0.8004 * (normalized) dst_host_srv_diff_host_rate
 +       0.2301 * (normalized) dst_host_serror_rate
 +       0.6401 * (normalized) dst_host_srv_serror_rate
 +       0.6422 * (normalized) dst_host_rerror_rate
 +       0.3692 * (normalized) dst_host_srv_rerror_rate
 -       2.5266

Number of kernel evaluations: -1049600465

Output prediction - sample output

inst#     actual  predicted error prediction
        1   1:normal   1:normal       1
        2   1:normal   1:normal       1
        3  2:anomaly  2:anomaly       1
        4   1:normal   1:normal       1
        5   1:normal   1:normal       1
        6  2:anomaly  2:anomaly       1
        7  2:anomaly  2:anomaly       1
        8  2:anomaly  2:anomaly       1
        9  2:anomaly  2:anomaly       1
       10  2:anomaly  2:anomaly       1
       11  2:anomaly  2:anomaly       1
       12  2:anomaly  2:anomaly       1
       13   1:normal   1:normal       1
       14  2:anomaly   1:normal   +   1
       15  2:anomaly  2:anomaly       1
       16  2:anomaly  2:anomaly       1
       17   1:normal   1:normal       1
       18  2:anomaly  2:anomaly       1
       19   1:normal   1:normal       1
       20   1:normal   1:normal       1
       21  2:anomaly  2:anomaly       1
       22  2:anomaly  2:anomaly       1
       23   1:normal   1:normal       1
       24   1:normal   1:normal       1
       25  2:anomaly  2:anomaly       1
       26   1:normal   1:normal       1
       27  2:anomaly  2:anomaly       1
       28   1:normal   1:normal       1
       29   1:normal   1:normal       1
       30   1:normal   1:normal       1
       31  2:anomaly  2:anomaly       1
       32  2:anomaly  2:anomaly       1
       33   1:normal   1:normal       1
       34  2:anomaly  2:anomaly       1
       35   1:normal   1:normal       1
       36   1:normal   1:normal       1
       37   1:normal   1:normal       1
       38  2:anomaly  2:anomaly       1
       39   1:normal   1:normal       1
       40  2:anomaly  2:anomaly       1
       41  2:anomaly  2:anomaly       1
       42  2:anomaly  2:anomaly       1
       43   1:normal   1:normal       1
       44   1:normal   1:normal       1
       45   1:normal   1:normal       1
       46  2:anomaly  2:anomaly       1
       47  2:anomaly  2:anomaly       1
       48   1:normal   1:normal       1
       49  2:anomaly   1:normal   +   1
       50  2:anomaly  2:anomaly       1

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.986    0.039    0.967      0.986    0.976      0.948    0.973     0.960     normal
                 0.961    0.014    0.983      0.961    0.972      0.948    0.973     0.963     anomaly
Weighted Avg.    0.974    0.028    0.974      0.974    0.974      0.948    0.973     0.962

=== Confusion Matrix ===

     a     b   <-- classified as
 66389   954 |     a = normal
  2301 56329 |     b = anomaly

Prune · Accepted Answer

That output is the scoring function. Read the equals sign as a simple Boolean operator, evaluating to 1 for true, 0 for false. Thus, out of all the choices for a classification attributes, only one of the coefficients will affect the scoring value.

For example, let's consider only the first three attributes, with these normalized inputs and resulting values:

duration      2.0     -0.0498 * 2.0 => -0.0996
protocol_type icmp     0.1105
service       eco_i    1.1964

Note that the other protocol_type and service terms (such as

-0.6236 * protocol_type=udp

) have comparisons that evaluate to 0 (protocol_type=upd becomes 0), so those coefficients won't affect the overall sum.

From these three attributes, the score so far is the sum of these three terms, or 1.2073. Continue with the other 39 attributes, plus the constant -2.5266 at the end, and there's your vector's score.

Does that explain it well enough?

The critical phrase in the blog you cite is:

if the output of the scoring function is negative then the input is classified as belonging to class y = -1. If the score is positive, the input is classified as belonging to class y = 1.

Yes, it's that simple: implement that nice, linear scoring function (42 variables, 116 terms). Plug in a vector. If the function comes up positive, the vector is normal; if it comes up negative, the vector is an anomaly.

Yes, your model is significantly longer than the blog's example. That example is based on two continuous features; you have 42 features, three of which are classification features (hence the extra 73 terms). The example has 3 support vectors; yours will have 43 (N dimensions requires N+1 support vectors). However, even this 42-dimensional model operates on the same principle: positive = normal, negative = anomaly.

As for your desire to map to a 2-dimensional display ... it's possible ... but I don't know what you'd find meaningful in this instance. Mapping 42 variables to 3 creates a lot of congestion in our space. I've seen some nice tricks here and there, especially with gradient fields where the force vectors are in the same spatial interpretation as the data points. A weather map manages to represent x,y,z coordinates of a measurement, adding wind velocity (3D), cloud cover, and maybe a couple other metrics into the display. That's maybe 10 symbolic dimensions.

In your case, we could perhaps just drop the dimensions with coefficients smaller than 0.07 as being insignificant; that saves 6 features. The three classification features we could perhaps represent with color, dashed/dotted/solid symbol, and a tiny text overlay on the O or X (normal/anomaly data). That's 9 down without using Cartesian position (x,y,z coordinates, assuming the plot is meaningful in 3D).

However, I don't know your data nearly well enough to suggest where we might cram the remaining 33 features into 2 or 3 dimensions. Can you somehow combine any of those inputs? Does a linear combination of multiple features give you a result that is still meaningful in prediction?

If not, then we're stuck with the canonical approach: pick interesting combinations of features (usually pairs). Plot a graph for each, ignoring the other features entirely. If none of those make visual sense ... there's our answer: no we can't plot the data nicely. Sorry, but reality often does this to us in a complex environment, and we handle the data in tables, correlations, and other methods we can handle with our 3D minds.

SVM - scoring function

Answers (2)

Related Questions