Reputation: 127
I am using the awesome RWeka package in order to fit a SMOreg model as implemented in Weka. While everything is working fine, I have some problem extracting the weights from the fitted model.
As all Weka classifier object, my model has a nice print method that shows me all the features and their relative weights. However, I am not able to extract this weights in any way.
You can see for yourself by running the following code:
library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
model_SMOreg <- SMOreg_classifier(mpg ~ ., data = mtcars)
Now, if you simply call the model
model_SMOreg
you'll see that it prints all the features used in the model with their relative weight. I would like to access those weights as a vector or, even better, as a 2-columns table with one column containing the names of the features and the other containing the weights.
I am working on a Windows 7 x64 system, using RStudio Version 1.0.153, R 3.4.2 Short Summer and RWeka 0.4-35.
Does someone know how to do this ?
Upvotes: 0
Views: 205
Reputation: 127
Based on the suggestion of @knb I have wrote a function to extract the weights from a SMOreg model and return a tibble with one column for the features name and one for the features weight, with the row arranged following the absolute value of the weight.
Note that this function only works for the SMOreg classifier, as the output of other classifiers is slightly different in terms of layout. However, I think the function can be easily adapted for other classifiers.
library(stringr)
library(tidyverse)
extract_weights_from_SMOreg <- function(model) {
oldw <- getOption("warn")
options(warn = -1)
raw_output <- capture.output(model)
trimmed_output <- raw_output[-c(1:3,(length(raw_output) - 4): length(raw_output))]
df <- data_frame(features_name = vector(length = length(trimmed_output) + 1, "character"),
features_weight = vector(length = length(trimmed_output) + 1, "numeric"))
for (line in 1:length(trimmed_output)) {
string_as_vector <- trimmed_output[line] %>%
str_split(string = ., pattern = " ") %>%
unlist(.)
numeric_element <- trimmed_output[line] %>%
str_split(string = ., pattern = " ") %>%
unlist(.) %>%
as.numeric(.)
position_mul <- string_as_vector[is.na(numeric_element)] %>%
str_detect(string = ., pattern = "[*]") %>%
which(.)
numeric_element <- numeric_element %>%
`[`(., c(1:position_mul))
text_element <- string_as_vector[is.na(numeric_element)]
there_is_plus <- string_as_vector[is.na(numeric_element)] %>%
str_detect(string = ., pattern = "[+]") %>%
sum(.)
if (there_is_plus) { sign_is <- "+"} else { sign_is <- "-"}
feature_weight <- numeric_element[!is.na(numeric_element)]
if (sign_is == "-") {df[line, "features_weight"] <- feature_weight * -1} else {df[line, "features_weight"] <- numeric_element[!(is.na(numeric_element))]}
df[line, "features_name"] <- paste(text_element[(position_mul + 1): length(text_element)], collapse = " ")
}
intercept_line <- raw_output[length(raw_output) - 4]
there_is_plus_intercept <- intercept_line %>%
str_detect(string = ., pattern = "[+]") %>%
sum(.)
if (there_is_plus_intercept) { intercept_sign_is <- "+"} else { intercept_sign_is <- "-"}
numeric_intercept <- intercept_line %>%
str_split(string = ., pattern = " ") %>%
unlist(.) %>%
as.numeric(.) %>%
`[`(., length(.))
df[nrow(df), "features_name"] <- "intercept"
if (intercept_sign_is == "-") {df[nrow(df), "features_weight"] <- numeric_intercept * -1} else {df[nrow(df), "features_weight"] <- numeric_intercept}
options(warn = oldw)
df <- df %>%
arrange(desc(abs(features_weight)))
return(df)
}
Here an example for one model
library(RWeka)
data("mtcars")
SMOreg_classifier <- make_Weka_classifier("weka/classifiers/functions/SMOreg")
mpg_model_weights <- extract_weights_from_SMOreg(SMOreg_classifier(data = mtcars, mpg ~ .))
mpg_model_weights
Upvotes: 0
Reputation: 9303
I think you cannot get this in numeric format.
attr(model_SMOreg, "meta")$class # "Weka_classifier"
getAnywhere("print.Weka_classifier")
Result:
A single object matching ‘print.Weka_classifier’ was found
It was found in the following places
registered S3 method for print from namespace RWeka
namespace:RWeka
with value
function (x, ...)
{
writeLines(.jcall(x$classifier, "S", "toString"))
invisible(x)
}
<bytecode: 0x8328630>
<environment: namespace:RWeka>
So we see: print.Weka_classifier()
makes a .writeLines()
call which in turn makes a rJava::.jcall
call, which returns a string.
Thus, I think you need to parse the weights yourself, perhaps by calling the capture.output()
method.
Upvotes: 1