Reputation: 370
I have a large dataset with accuracies, as an example I have:
acc
V1 V2
1 0.65996025 B1
2 0.55217749 B1
3 0.78412743 B1
4 0.95358681 B1
5 0.23634827 B2
6 0.35234372 B2
7 0.21214891 B2
8 0.03710918 B2
9 0.84751145 B3
10 0.89086948 B3
11 0.59060242 B3
12 0.68724963 B3
I made sub groups
B1 = acc[acc$V2 == "B1",]
B2 = acc[acc$V2 == "B2",]
B3 = acc[acc$V2 == "B3",]
I want to have the difference between each group like:
diff_1_2 = B1$V1 - B2$V1
diff_1_3 = B1$V1 - B3$V1
diff_2_3 = B2$V1 - B3$V1
I want to use it to calculate p-values using the following equation:
t.value = (mean(diff_1_2)) / (sd(diff_1_2)
p.value = 2*pt(-abs(t.value), df=nrow(diff_1_2)-1)
sig<-ifelse(as.numeric(mean(p.value))<0.05,"sig","no")
As you can see this is very inefficient. So the question is how to do it in a loop and at the end I would like to have a table like for example
results
B1_B2 sig
B1_B3 sig
B2_B3 sig
Any ideas?? Thank you in advance
Upvotes: 1
Views: 90
Reputation: 2626
Using your (properly formatted) data:
acc <- tibble::tribble(
~V1, ~V2,
0.65996025, "B1",
0.55217749, "B1",
0.78412743, "B1",
0.95358681, "B1",
0.23634827, "B2",
0.35234372, "B2",
0.21214891, "B2",
0.03710918, "B2",
0.84751145, "B3",
0.89086948, "B3",
0.59060242, "B3",
0.68724963, "B3"
)
You can split it like so:
split <- split(acc, ~V2)
You could then define your test function (after some debugging):
your_test <- function(values) {
t.value <- mean(values) / sd(values)
p.value <- 2 * pt(-abs(t.value), df = length(values) - 1)
ifelse(mean(p.value) < 0.05, "sig", "no")
}
And plug it into a purrr
-style mapping/reducing:
library(purrr)
unique(acc$V2) %>%
combn(2, simplify = F) %>%
set_names(map(., paste, collapse = "_")) %>%
map(~ split[[.x[1]]]$V1 - split[[.x[2]]]$V1) %>%
imap(~ data.frame(results = your_test(.x), row.names = .y)) %>%
reduce(rbind)
Returning:
results
B1_B2 no
B1_B3 no
B2_B3 sig
Upvotes: 1