Reputation: 39
I'm trying to write a function that takes a dataframe, a main variable, and a list of variables and uses the cor.test function. I'm looking for it to return a dataframe with the variable names and the correlation coefficient and p-value.
The code I have so far is:
myCorTest = function(dat, mainVar, varlist)
{
result = data.frame()
mainV = dat[[mainVar]]
for (i in 1:length(varlist)){
var_select = dat[[varlist[i]]]
x = cor.test(mainV, var_select)
R = x$estimate
p = x$p.value
result = cbind(mainVar, varlist, R, p)
}
return(result)
}
I want the output to look like this:
> myCortest (chol, "bmi", c("sbp", "dbp", "vldl", "hdl", "ldl"))
var1 var2 R p
sbp bmi sbp 0.14927952 3.877523e-02
dbp bmi dbp 0.42636371 6.997094e-10
vldl bmi vldl 0.41033688 4.107925e-09
hdl bmi hdl -0.11984422 9.956239e-02
ldl bmi ldl 0.03449137 6.366170e-01
But my outputs are:
> myCorTest(chol, "bmi", c("sbp","dbp", "vldl", "hdl", "ldl"))
mainVar varlist R p
[1,] "bmi" "sbp" "0.0344913724648321" "0.636617020943996"
[2,] "bmi" "dbp" "0.0344913724648321" "0.636617020943996"
[3,] "bmi" "vldl" "0.0344913724648321" "0.636617020943996"
[4,] "bmi" "hdl" "0.0344913724648321" "0.636617020943996"
[5,] "bmi" "ldl" "0.0344913724648321" "0.636617020943996"
Upvotes: 0
Views: 74
Reputation: 388907
Growing objects/dataframes in a loop is inefficient. I would use lapply
:
myCorTest = function(dat, mainVar, varlist) {
mainV = dat[[mainVar]]
do.call(rbind, lapply(varlist, function(x) {
temp = cor.test(mainV, dat[[x]])
R = temp$estimate
p = temp$p.value
data.frame(mainVar = mainVar, varlist = x, R, p)
})) -> result
rownames(result) <- NULL
return(result)
}
myCorTest(mtcars, 'mpg', c('cyl', 'am'))
# mainVar varlist R p
#1 mpg cyl -0.852 6.11e-10
#2 mpg am 0.600 2.85e-04
Upvotes: 1
Reputation: 1166
The problem with your codes is cbind creating a matrix, where the matrix needs all values inside it to have the same data types. What you need is to create a data.frame. Try this :
myCorTest = function(dat, mainVar, varlist)
{
# Create empty data.frame to store all results with its data types
result = data.frame(var1=character(),
var2=character(),
R=numeric(),
p=numeric()
)
mainV = dat[[mainVar]]
for (i in 1:length(varlist)){
var_select = dat[[varlist[i]]]
x = cor.test(mainV, var_select)
R = x$estimate
p = x$p.value
result_temp = data.frame(mainVar, varlist[i], R, p)
row.names(result_temp) = varlist[i]
result = rbind(result,result_temp)
}
colnames(result) = c("var1","var2","R","p")
return(result)
}
myCorTest(chol, "bmi", c("sbp", "dbp", "vldl", "hdl", "ldl"))
Upvotes: 2