data1082
data1082

Reputation: 75

Apply function over columns of dataframe in R, compile results

I've searched here and on Google and haven't found an answer that I can apply to my situation.

Lets say I have a dataframe with columns for Element 1, Element 2, Element 3, Metric, Other. I have another internal function that has three arguments (input_dataframe, element_position, metric_position) that I use to perform calculations one element at a time. It outputs a dataframe, lets say 1 row by three variables.

I have been trying to use either lapply or for loops to write code that will allow me to specify the range of columns containing the elements (in this example above, its columns 1-3 of the dataframe) and run the function for all the specified columns against the metric column and then combine the results into one table that has the results of each run of the function. I havent had any luck making this work trying variations of lapply and for loops with seq_along. Any suggestions? Sample data, code, and output below for my current inefficient solution:

#example data
element1 <- c("control", "control", "variation", "variation")
element2 <- c("control", "variation", "variation", "control")
element3 <- c("variation", "control", "variation", "variation")
metric <- c(10,15,20,25)
other <- c(2,4,2,6)
data<-data.frame(element1, element2, element3, metric, other)

#example function
test_func <- function(input_df,element_position,metric_position)
{
  df <- input_df[,c(element_position,metric_position)]
  colnames(df) <- c("element","metric")
  mean <- ddply(df,~element,summarise,mean(metric))
  control <- mean[1,2]
  variation <- mean[2,2]
  lift <- (variation-control)/control
  df_table <<- data.frame(control,variation,lift)
}

#call function three times, once for each element, compile results
test_func(data,1,4)
element1 <- df_table
test_func(data,2,4)
element2 <- df_table
test_func(data,3,4)
element3 <- df_table
summary_output <- rbind(element1,element2,element3)

Upvotes: 2

Views: 897

Answers (2)

Silence Dogood
Silence Dogood

Reputation: 3597

There is a typo in the part df_table <<- data.frame(control,variation,lift), The operator <<- does a global assignment instead of local function environment hence the latest value overrides the previous ones. Editing the typo and using lapply and rbind gives the result you expected.

test_func_modif <- function(input_df,element_position,metric_position)
{
  df <- input_df[,c(element_position,metric_position)]
  colnames(df) <- c("element","metric")
  mean <- ddply(df,~element,summarise,mean(metric))
  control <- mean[1,2]
  variation <- mean[2,2]
  lift <- (variation-control)/control
  df_table <- data.frame(control,variation,lift)
}




element_vec  = 1:3
metric_position_value = 4
result_list = lapply(element_vec,function(x) test_func_modif(data,x,metric_position_value))
result_DF = do.call(rbind,result_list)
# > result_DF
#   control variation      lift
# 1    12.5  22.50000 0.8000000
# 2    17.5  17.50000 0.0000000
# 3    15.0  18.33333 0.2222222
# > all.equal(summary_output,result_DF)
# [1] TRUE

Upvotes: 0

rawr
rawr

Reputation: 20811

I made some minor changes to your function. You should just return the object and save the result of the function rather than using <<-

#example data
element1 <- c("control", "control", "variation", "variation")
element2 <- c("control", "variation", "variation", "control")
element3 <- c("variation", "control", "variation", "variation")
metric <- c(10,15,20,25)
other <- c(2,4,2,6)
data<-data.frame(element1, element2, element3, metric, other)

#example function
test_func <- function(input_df,element_position,metric_position)
{
  require('plyr')
  df <- input_df[,c(element_position,metric_position)]
  colnames(df) <- c("element","metric")
  mean <- ddply(df,~element,summarise,mean(metric))
  control <- mean[1,2]
  variation <- mean[2,2]
  lift <- (variation-control)/control
  data.frame(control,variation,lift)
}

this will just map each set of parameters to the test_func:

  1. data, element_position = 1, metric_position = 4
  2. data, element_position = 2, metric_position = 4
  3. data, element_position = 3, metric_position = 4

etc.

do.call('rbind', Map(test_func, rep(list(data), 3), 1:3, rep(4, 3)))

#   control variation      lift
# 1    12.5  22.50000 0.8000000
# 2    17.5  17.50000 0.0000000
# 3    15.0  18.33333 0.2222222

Upvotes: 1

Related Questions