Leilani Konrad
Leilani Konrad

Reputation: 39

How to loop through a variable list and add values to an output dataframe in R?

I'm trying to write a function that takes a dataframe, a main variable, and a list of variables and uses the cor.test function. I'm looking for it to return a dataframe with the variable names and the correlation coefficient and p-value.

The code I have so far is:

myCorTest = function(dat, mainVar, varlist)
{
  result = data.frame()
  mainV = dat[[mainVar]]
  for (i in 1:length(varlist)){
    var_select = dat[[varlist[i]]]
    x = cor.test(mainV, var_select)
    R = x$estimate
    p = x$p.value
    result = cbind(mainVar, varlist, R, p)
}
  return(result)
}

I want the output to look like this:

> myCortest (chol, "bmi", c("sbp", "dbp", "vldl", "hdl", "ldl"))
    var1 var2     R             p
sbp  bmi sbp  0.14927952  3.877523e-02
dbp  bmi dbp  0.42636371  6.997094e-10
vldl bmi vldl 0.41033688  4.107925e-09
hdl  bmi hdl  -0.11984422 9.956239e-02
ldl  bmi ldl  0.03449137  6.366170e-01

But my outputs are:

> myCorTest(chol, "bmi", c("sbp","dbp", "vldl", "hdl", "ldl"))
     mainVar varlist R                    p                  
[1,] "bmi"   "sbp"   "0.0344913724648321" "0.636617020943996"
[2,] "bmi"   "dbp"   "0.0344913724648321" "0.636617020943996"
[3,] "bmi"   "vldl"  "0.0344913724648321" "0.636617020943996"
[4,] "bmi"   "hdl"   "0.0344913724648321" "0.636617020943996"
[5,] "bmi"   "ldl"   "0.0344913724648321" "0.636617020943996"

Upvotes: 0

Views: 74

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

Growing objects/dataframes in a loop is inefficient. I would use lapply :

myCorTest = function(dat, mainVar, varlist) {
  mainV = dat[[mainVar]]
  do.call(rbind, lapply(varlist, function(x) {
    temp = cor.test(mainV, dat[[x]])
    R = temp$estimate
    p = temp$p.value
    data.frame(mainVar = mainVar, varlist = x, R, p)
  })) -> result
  rownames(result) <- NULL
  return(result)
}

myCorTest(mtcars, 'mpg', c('cyl', 'am'))

#  mainVar varlist      R        p
#1     mpg     cyl -0.852 6.11e-10
#2     mpg      am  0.600 2.85e-04

Upvotes: 1

Vinson Ciawandy
Vinson Ciawandy

Reputation: 1166

The problem with your codes is cbind creating a matrix, where the matrix needs all values inside it to have the same data types. What you need is to create a data.frame. Try this :

myCorTest = function(dat, mainVar, varlist)
{
# Create empty data.frame to store all results with its data types
  result = data.frame(var1=character(),
                      var2=character(),
                      R=numeric(),
                      p=numeric()
                      )
  mainV = dat[[mainVar]]
  for (i in 1:length(varlist)){
    var_select = dat[[varlist[i]]]
    x = cor.test(mainV, var_select)
    R = x$estimate
    p = x$p.value
    result_temp = data.frame(mainVar, varlist[i], R, p)
    row.names(result_temp) = varlist[i]
    result = rbind(result,result_temp)
}
colnames(result) = c("var1","var2","R","p")
  return(result)
}

myCorTest(chol, "bmi", c("sbp", "dbp", "vldl", "hdl", "ldl"))

Upvotes: 2

Related Questions