Reputation:
I have the following output from weka for SVM
classification. I wanted to plot the SVM classifier output in to anomaly or normal. How is it possible to get the SVM scoring function
out of this output?
=== Run information ===
Scheme: weka.classifiers.functions.SMO -C 1.0 -L 0.001 -P 1.0E-12 -N 0 -V -1 -W 1 -K "weka.classifiers.functions.supportVector.PolyKernel -E 1.0 -C 250007"
Relation: KDDTrain
Instances: 125973
Attributes: 42
duration
protocol_type
service
flag
src_bytes
dst_bytes
land
wrong_fragment
urgent
hot
num_failed_logins
logged_in
num_compromised
root_shell
su_attempted
num_root
num_file_creations
num_shells
num_access_files
num_outbound_cmds
is_host_login
is_guest_login
count
srv_count
serror_rate
srv_serror_rate
rerror_rate
srv_rerror_rate
same_srv_rate
diff_srv_rate
srv_diff_host_rate
dst_host_count
dst_host_srv_count
dst_host_same_srv_rate
dst_host_diff_srv_rate
dst_host_same_src_port_rate
dst_host_srv_diff_host_rate
dst_host_serror_rate
dst_host_srv_serror_rate
dst_host_rerror_rate
dst_host_srv_rerror_rate
class
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
SMO
Kernel used:
Linear Kernel: K(x,y) = <x,y>
Classifier for classes: normal, anomaly
BinarySMO
Machine linear: showing attribute weights, not support vectors.
-0.0498 * (normalized) duration
+ 0.5131 * (normalized) protocol_type=tcp
+ -0.6236 * (normalized) protocol_type=udp
+ 0.1105 * (normalized) protocol_type=icmp
+ -1.1861 * (normalized) service=auth
+ 0 * (normalized) service=bgp
+ 0 * (normalized) service=courier
+ 1 * (normalized) service=csnet_ns
+ 1 * (normalized) service=ctf
+ 1 * (normalized) service=daytime
+ -0 * (normalized) service=discard
+ -1.2505 * (normalized) service=domain
+ -0.6878 * (normalized) service=domain_u
+ 0.9418 * (normalized) service=echo
+ 1.1964 * (normalized) service=eco_i
+ 0.9767 * (normalized) service=ecr_i
+ 0.0073 * (normalized) service=efs
+ 0.0595 * (normalized) service=exec
+ -1.4426 * (normalized) service=finger
+ -1.047 * (normalized) service=ftp
+ -1.4225 * (normalized) service=ftp_data
+ 2 * (normalized) service=gopher
+ 1 * (normalized) service=hostnames
+ -0.9961 * (normalized) service=http
+ 0.7255 * (normalized) service=http_443
+ 0.5128 * (normalized) service=imap4
+ -6.3664 * (normalized) service=IRC
+ 1 * (normalized) service=iso_tsap
+ -0 * (normalized) service=klogin
+ 0 * (normalized) service=kshell
+ 0.7422 * (normalized) service=ldap
+ 1 * (normalized) service=link
+ 0.5993 * (normalized) service=login
+ 1 * (normalized) service=mtp
+ 1 * (normalized) service=name
+ 0.2322 * (normalized) service=netbios_dgm
+ 0.213 * (normalized) service=netbios_ns
+ 0.1902 * (normalized) service=netbios_ssn
+ 1.1472 * (normalized) service=netstat
+ 0.0504 * (normalized) service=nnsp
+ 1.058 * (normalized) service=nntp
+ -1 * (normalized) service=ntp_u
+ -1.5344 * (normalized) service=other
+ 1.3595 * (normalized) service=pm_dump
+ 0.8355 * (normalized) service=pop_2
+ -2 * (normalized) service=pop_3
+ 0 * (normalized) service=printer
+ 1.051 * (normalized) service=private
+ -0.3082 * (normalized) service=red_i
+ 1.0034 * (normalized) service=remote_job
+ 1.0112 * (normalized) service=rje
+ -1.0454 * (normalized) service=shell
+ -1.6948 * (normalized) service=smtp
+ 0.1388 * (normalized) service=sql_net
+ -0.3438 * (normalized) service=ssh
+ 1 * (normalized) service=supdup
+ 0.8756 * (normalized) service=systat
+ -1.6856 * (normalized) service=telnet
+ -0 * (normalized) service=tim_i
+ -0.8579 * (normalized) service=time
+ -0.726 * (normalized) service=urh_i
+ -1.0285 * (normalized) service=urp_i
+ 1.0347 * (normalized) service=uucp
+ 0 * (normalized) service=uucp_path
+ 0 * (normalized) service=vmnet
+ 1 * (normalized) service=whois
+ -1.3388 * (normalized) service=X11
+ 0 * (normalized) service=Z39_50
+ 1.7882 * (normalized) flag=OTH
+ -3.0982 * (normalized) flag=REJ
+ -1.7279 * (normalized) flag=RSTO
+ 1 * (normalized) flag=RSTOS0
+ 2.4264 * (normalized) flag=RSTR
+ 1.5906 * (normalized) flag=S0
+ -1.952 * (normalized) flag=S1
+ -0.9628 * (normalized) flag=S2
+ -0.3455 * (normalized) flag=S3
+ 1.2757 * (normalized) flag=SF
+ 0.0054 * (normalized) flag=SH
+ 0.8742 * (normalized) src_bytes
+ 0.0542 * (normalized) dst_bytes
+ -1.2659 * (normalized) land=1
+ 2.7922 * (normalized) wrong_fragment
+ 0.0662 * (normalized) urgent
+ 8.1153 * (normalized) hot
+ 2.4822 * (normalized) num_failed_logins
+ 0.2242 * (normalized) logged_in=1
+ -0.0544 * (normalized) num_compromised
+ 0.9248 * (normalized) root_shell
+ -2.363 * (normalized) su_attempted
+ -0.2024 * (normalized) num_root
+ -1.2791 * (normalized) num_file_creations
+ -0.0314 * (normalized) num_shells
+ -1.4125 * (normalized) num_access_files
+ -0.0154 * (normalized) is_host_login=1
+ -2.3307 * (normalized) is_guest_login=1
+ 4.3191 * (normalized) count
+ -2.7484 * (normalized) srv_count
+ -0.6276 * (normalized) serror_rate
+ 2.843 * (normalized) srv_serror_rate
+ 0.6105 * (normalized) rerror_rate
+ 3.1388 * (normalized) srv_rerror_rate
+ -0.1262 * (normalized) same_srv_rate
+ -0.1825 * (normalized) diff_srv_rate
+ 0.2961 * (normalized) srv_diff_host_rate
+ 0.7812 * (normalized) dst_host_count
+ -1.0053 * (normalized) dst_host_srv_count
+ 0.0284 * (normalized) dst_host_same_srv_rate
+ 0.4419 * (normalized) dst_host_diff_srv_rate
+ 1.384 * (normalized) dst_host_same_src_port_rate
+ 0.8004 * (normalized) dst_host_srv_diff_host_rate
+ 0.2301 * (normalized) dst_host_serror_rate
+ 0.6401 * (normalized) dst_host_srv_serror_rate
+ 0.6422 * (normalized) dst_host_rerror_rate
+ 0.3692 * (normalized) dst_host_srv_rerror_rate
- 2.5266
Number of kernel evaluations: -1049600465
Output prediction - sample output
inst# actual predicted error prediction
1 1:normal 1:normal 1
2 1:normal 1:normal 1
3 2:anomaly 2:anomaly 1
4 1:normal 1:normal 1
5 1:normal 1:normal 1
6 2:anomaly 2:anomaly 1
7 2:anomaly 2:anomaly 1
8 2:anomaly 2:anomaly 1
9 2:anomaly 2:anomaly 1
10 2:anomaly 2:anomaly 1
11 2:anomaly 2:anomaly 1
12 2:anomaly 2:anomaly 1
13 1:normal 1:normal 1
14 2:anomaly 1:normal + 1
15 2:anomaly 2:anomaly 1
16 2:anomaly 2:anomaly 1
17 1:normal 1:normal 1
18 2:anomaly 2:anomaly 1
19 1:normal 1:normal 1
20 1:normal 1:normal 1
21 2:anomaly 2:anomaly 1
22 2:anomaly 2:anomaly 1
23 1:normal 1:normal 1
24 1:normal 1:normal 1
25 2:anomaly 2:anomaly 1
26 1:normal 1:normal 1
27 2:anomaly 2:anomaly 1
28 1:normal 1:normal 1
29 1:normal 1:normal 1
30 1:normal 1:normal 1
31 2:anomaly 2:anomaly 1
32 2:anomaly 2:anomaly 1
33 1:normal 1:normal 1
34 2:anomaly 2:anomaly 1
35 1:normal 1:normal 1
36 1:normal 1:normal 1
37 1:normal 1:normal 1
38 2:anomaly 2:anomaly 1
39 1:normal 1:normal 1
40 2:anomaly 2:anomaly 1
41 2:anomaly 2:anomaly 1
42 2:anomaly 2:anomaly 1
43 1:normal 1:normal 1
44 1:normal 1:normal 1
45 1:normal 1:normal 1
46 2:anomaly 2:anomaly 1
47 2:anomaly 2:anomaly 1
48 1:normal 1:normal 1
49 2:anomaly 1:normal + 1
50 2:anomaly 2:anomaly 1
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.986 0.039 0.967 0.986 0.976 0.948 0.973 0.960 normal
0.961 0.014 0.983 0.961 0.972 0.948 0.973 0.963 anomaly
Weighted Avg. 0.974 0.028 0.974 0.974 0.974 0.948 0.973 0.962
=== Confusion Matrix ===
a b <-- classified as
66389 954 | a = normal
2301 56329 | b = anomaly
Upvotes: 1
Views: 1688
Reputation: 77837
That output is the scoring function. Read the equals sign as a simple Boolean operator, evaluating to 1 for true, 0 for false. Thus, out of all the choices for a classification attributes, only one of the coefficients will affect the scoring value.
For example, let's consider only the first three attributes, with these normalized inputs and resulting values:
duration 2.0 -0.0498 * 2.0 => -0.0996
protocol_type icmp 0.1105
service eco_i 1.1964
Note that the other protocol_type and service terms (such as
-0.6236 * protocol_type=udp
) have comparisons that evaluate to 0 (protocol_type=upd becomes 0), so those coefficients won't affect the overall sum.
From these three attributes, the score so far is the sum of these three terms, or 1.2073. Continue with the other 39 attributes, plus the constant -2.5266 at the end, and there's your vector's score.
Does that explain it well enough?
The critical phrase in the blog you cite is:
if the output of the scoring function is negative then the input is classified as belonging to class y = -1. If the score is positive, the input is classified as belonging to class y = 1.
Yes, it's that simple: implement that nice, linear scoring function (42 variables, 116 terms). Plug in a vector. If the function comes up positive, the vector is normal; if it comes up negative, the vector is an anomaly.
Yes, your model is significantly longer than the blog's example. That example is based on two continuous features; you have 42 features, three of which are classification features (hence the extra 73 terms). The example has 3 support vectors; yours will have 43 (N dimensions requires N+1 support vectors). However, even this 42-dimensional model operates on the same principle: positive = normal, negative = anomaly.
As for your desire to map to a 2-dimensional display ... it's possible ... but I don't know what you'd find meaningful in this instance. Mapping 42 variables to 3 creates a lot of congestion in our space. I've seen some nice tricks here and there, especially with gradient fields where the force vectors are in the same spatial interpretation as the data points. A weather map manages to represent x,y,z coordinates of a measurement, adding wind velocity (3D), cloud cover, and maybe a couple other metrics into the display. That's maybe 10 symbolic dimensions.
In your case, we could perhaps just drop the dimensions with coefficients smaller than 0.07 as being insignificant; that saves 6 features. The three classification features we could perhaps represent with color, dashed/dotted/solid symbol, and a tiny text overlay on the O or X (normal/anomaly data). That's 9 down without using Cartesian position (x,y,z coordinates, assuming the plot is meaningful in 3D).
However, I don't know your data nearly well enough to suggest where we might cram the remaining 33 features into 2 or 3 dimensions. Can you somehow combine any of those inputs? Does a linear combination of multiple features give you a result that is still meaningful in prediction?
If not, then we're stuck with the canonical approach: pick interesting combinations of features (usually pairs). Plot a graph for each, ignoring the other features entirely. If none of those make visual sense ... there's our answer: no we can't plot the data nicely. Sorry, but reality often does this to us in a complex environment, and we handle the data in tables, correlations, and other methods we can handle with our 3D minds.
Upvotes: 1
Reputation: 4101
Kind of why not completely different, but I guess it can solve your underlying problem. I assume you used the Weka Explorer to generate the model.
If you go to the Classify tab
, click on More
options... and tick Output predictions
. You get the probabilities of each classification. This should allow you to plot normal vs. abnormal
For iris
I get something like
inst#, actual, predicted, error, probability distribution
1 3:Iris-vir 3:Iris-vir 0 0.333 *0.667
2 3:Iris-vir 3:Iris-vir 0 0.333 *0.667
3 3:Iris-vir 3:Iris-vir 0 0.333 *0.667
4 3:Iris-vir 3:Iris-vir 0 0.333 *0.667
5 3:Iris-vir 3:Iris-vir 0 0.333 *0.667
6 1:Iris-set 1:Iris-set *0.667 0.333 0
7 1:Iris-set 1:Iris-set *0.667 0.333 0
8 1:Iris-set 1:Iris-set *0.667 0.333 0
9 1:Iris-set 1:Iris-set *0.667 0.333 0
10 1:Iris-set 1:Iris-set *0.667 0.333 0
It contains the probability for each class.
Upvotes: 1