Reputation: 43
I am fairly new to R. My data looks something like this (only with 9000 columns and 66 rows)
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
I want to get a data frame that looks like this :
ID1, rho, p-value
ID2, rho, p-value
...
The rho and the p-value would be the results from a cor.test (spearman) with Time and each ID
Among other things I've tried this:
results <- data.frame(ID="", Estimate="", P.value="")
estimates = numeric(16)
pvalues = numeric(16)
for (i in 2:4){
test <- cor.test(DF[,1], DF[,i])
estimates[i] = test$estimate
pvalues[i] = test$p.value
}
And R gives me the following error:
Error: object 'test' not found
I've also tried:
result <- do.call(rbind,lapply(2:4, function(x) {
cor.result<-cor.test(DF[,1],DF[,x])
pvalue <- cor.result$p.value
estimate <- cor.result$estimate
return(data.frame(pvalue = pvalue, estimate = estimate))
})
)
And R gives me a similar error
Error: object 'cor.result' not found
I'm sure it's an easy fix but I can't seem to figure it out. Any help is more than welcome.
This is what I got after running
dput(head(SmallDataset[,1:5]))
structure(list(Species = c("Human.hsapiens", "Chimpanzee.ptroglodytes",
"Gorilla.ggorilla", "Orangutan.pabelii", "Gibbon.nleucogenys",
"Macaque.mmulatta"), Time = c(0, 6.4, 8.61, 15.2, 19.43, 28.1
), ID1 = c(55030, 54539, 54937, 48897, 58160, 54686), ID2 = c(20485,
11907, 10571, 20974, 10462, 11149), ID3 = c(93914, 44482, 43705,
51144, 49485, 43908)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
Upvotes: 0
Views: 111
Reputation: 107652
Consider building a list of data frames witih lapply
(an iteration function similar to for
but builds a list of objects of equal length as input). Afterwards, row bind all data frame elements together:
results <- lapply(2:4, function(i){
test <- cor.test(DF[,1], DF[,i])
data.frame(ID = names(DF)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
})
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 -0.6238591 0.009805341
# 2 ID2 -0.2270515 0.455676037
# 3 ID3 -0.4964092 0.050481533
NOTE: Your posted data for Time is missing an observation and cannot immediately be cast into data.frame()
with other vectors. To resolve, I supplemented a 6th 88 at end:
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 88)
Using posted SmallDataset:
SmallDataset <- structure(...)
results <- lapply(3:5, function(i){
test <- cor.test(SmallDataset$Time, SmallDataset[,i])
data.frame(ID = names(SmallDataset)[i],
estimate = unname(test$estimate),
pvalues = unname(test$p.value))
})
final_df <- do.call(rbind, results)
final_df
# ID estimate pvalues
# 1 ID1 0.03251407 0.9512461
# 2 ID2 -0.41733336 0.4103428
# 3 ID3 -0.60732484 0.2010166
Upvotes: 0
Reputation: 3060
My solution involves defining a function within a lapply call
##
library(dplyr)
###Create dataframe
Time <- c(0, 6.4, 8.6, 15.2, 19.4, 28.1, 42.6, 73, 73, 85, 88, 88, 88, 88, 88, 89)
ID1 <- c(55030, 54539, 54937, 48897, 58160, 54686, 55393, 47191, 39805, 37601, 51328, 28882, 45587, 60061, 31892, 28670)
ID2 <- c(20485, 11907, 10571, 20974, 10462, 11149, 20970, NA, NA, 9295, NA, 8714, 24446, 10748, 9037, 11859)
ID3 <- c(93914, 44482, 43705, 51144, 49485, 43908, 44324, 37342, 18872, 39660,61673, 43837, 36528, 44738, 41648, 11100)
DF <- data.frame (Time, ID1, ID2, ID3)
##Run the correlations
l2 <- lapply(2:4, function(i)cor.test(DF$Time, DF[,i]))
##Define function to extract p_value and coefficients
l3 <- lapply(l2, function(i){
return(tibble(estimate = i$estimate,
p_value = i$p.value))
})
##Create a dataframe with information
l4 <- bind_rows(l3) %>% mutate(ID = paste0("ID", 1:3)) ##Data frame with info
l4
Upvotes: 1