Ujjawal Bhandari
Ujjawal Bhandari

Reputation: 1372

Run Function in Loop in R

I need to use the below function in loop as i have 100s of variables.

binning <- function (df,vars,by=0.1,eout=TRUE,verbose=FALSE) {
for (col in vars) {
  breaks <- numeric(0)
  if(eout) {
    x <- boxplot(df[,col][!df[[col]] %in% boxplot.stats(df[[col]])$out],plot=FALSE)    
    non_outliers <- df[,col][df[[col]] <= x$stats[5] & df[[col]] >= x$stats[1]]
    if (!(min(df[[col]])==min(non_outliers))) {
      breaks <- c(breaks, min(df[[col]]))
    }
  }
breaks <- c(breaks, quantile(if(eout) non_outliers else df[[col]], probs=seq(0,1, by=by)))  
  if(eout) {
    if (!(max(df[[col]])==max(non_outliers))) {
      breaks <- c(breaks, max(df[[col]]))
    }    
  }

  return (cut(df[[col]],breaks=breaks,include.lowest=TRUE))
}}

It creates a variable with binned score. The naming convention of variable is "the original name" plus "_bin".

data$credit_amount_bin <- iv.binning.simple(data,"credit_amount",eout=FALSE)

I want the function runs for all the NUMERIC variables and store the converted bins variables in a different data frame and name them with "the original name _bin".

Any help would be highly appreciated.

Upvotes: 0

Views: 135

Answers (1)

SimonG
SimonG

Reputation: 4881

Using your function, you could go via lapply, looping over all values that are numeric.

# some data
dat0 <- data.frame(a=letters[1:10], x=rnorm(10), y=rnorm(10), z=rnorm(10))

# find all numeric by names
vars <- colnames(dat0)[which(sapply(dat0,is.numeric))]

# target data set
dat1 <- as.data.frame( lapply(vars, function(x) binning(dat0,x,eout=FALSE)) )
colnames(dat1) <- paste(vars, "_bin", sep="")

Personally, I would prefer having this function with vector input instead of data frame plus variable names. It might run more efficiently, too.

Upvotes: 3

Related Questions