Jack Someone
Jack Someone

Reputation: 97

Specify arguments when applying function with sapply

I created the following function which finds the columns correlation to the target. The function is applied on the diamonds dataset (assigned to dt here) for this purpose.

select_variables_gen <- function(variable, target = dt$price, threshold = 0.9){
  if(all(class(variable) %in% c("numeric","integer"))){
    corr <-  abs(cor(variable, target));
    if(corr > threshold){
      return(T);
    }else{F}
  }else{F}
};

Now that I want to apply the function I can't figure out how to specify the arguments of the function. This is what I tried

alt_selected_gen <- names(dt)[sapply(dt, 
select_variables(variable = dt, target = dt$carat, threshold = 0.1))]

alt_selected_gen;

Which returns an error saying thaht the 2nd and 3rd argument are unused. How can I use the function (with sapply or any other way) to be able to specify the arguments?

My desired output is the column names of the columns which have a correlation above the threshold. So using the default values with the above code that would be;

[1] "carat" "price"

Upvotes: 1

Views: 911

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173813

You pass your function to sapply. What you are trying to pass is a call to your function.

When you use sapply on a data frame, the columns get sent one by one to your function as its first argument. If you want to pass further named arguments to your function you just add them directly as parameters to sapply after the function itself. This works because of the dots operator (...) in sapply's formal arguments, which pass any extra parameters into the call to your function.

It should therefore just be

names(dt)[sapply(dt, select_variables_gen, target = dt$carat, threshold = 0.1)]
#> [1] "carat" "table" "price" "x"     "y"     "z"  

Notice also that the function is called select_variables_gen in your example, not select_variables.

Upvotes: 2

Related Questions