Reputation: 97
I created the following function which finds the columns correlation to the target. The function is applied on the diamonds dataset (assigned to dt here) for this purpose.
select_variables_gen <- function(variable, target = dt$price, threshold = 0.9){
if(all(class(variable) %in% c("numeric","integer"))){
corr <- abs(cor(variable, target));
if(corr > threshold){
return(T);
}else{F}
}else{F}
};
Now that I want to apply the function I can't figure out how to specify the arguments of the function. This is what I tried
alt_selected_gen <- names(dt)[sapply(dt,
select_variables(variable = dt, target = dt$carat, threshold = 0.1))]
alt_selected_gen;
Which returns an error saying thaht the 2nd and 3rd argument are unused. How can I use the function (with sapply or any other way) to be able to specify the arguments?
My desired output is the column names of the columns which have a correlation above the threshold. So using the default values with the above code that would be;
[1] "carat" "price"
Upvotes: 1
Views: 911
Reputation: 173813
You pass your function to sapply
. What you are trying to pass is a call to your function.
When you use sapply
on a data frame, the columns get sent one by one to your function as its first argument. If you want to pass further named arguments to your function you just add them directly as parameters to sapply
after the function itself. This works because of the dots operator (...
) in sapply
's formal arguments, which pass any extra parameters into the call to your function.
It should therefore just be
names(dt)[sapply(dt, select_variables_gen, target = dt$carat, threshold = 0.1)]
#> [1] "carat" "table" "price" "x" "y" "z"
Notice also that the function is called select_variables_gen
in your example, not select_variables
.
Upvotes: 2