Reputation: 403
I am new to R and I have a question about extracting data from a list with multiple groups. For example, I have a set of data like this:
data(iris)
iris$Group = rep(c("High","Low", each=5))
iris = iris[sample(nrow(iris)),]
mylist = list(iris[1:50,], iris[51:100,], iris[101:150,])
head(mylist)[[1]]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Group
51 7.0 3.2 4.7 1.4 versicolor High
123 7.7 2.8 6.7 2.0 virginica High
147 6.3 2.5 5.0 1.9 virginica Low
23 4.6 3.6 1.0 0.2 setosa High
120 6.0 2.2 5.0 1.5 virginica Low
141 6.7 3.1 5.6 2.4 virginica High
Within each list, I would like to group by Species
and calculate the P value by t.test
of Sepal.Length
between Group
High and Low. For example, I would like to get the P value of between Group
High and Low of Species
virginica, and so on for each list.
I am confused about this. Could anyone help? Thanks!
Upvotes: 0
Views: 181
Reputation: 50678
In base R you can do the following
lapply(mylist, function(x)
with(x, t.test(Sepal.Length[Group == "High"], Sepal.Length[Group == "Low"])$p.value))
#[[1]]
#[1] 0.2751545
#
#[[2]]
#[1] 0.5480918
#
#[[3]]
#[1] 0.864256
Or a purrr
/tidyverse
approach
library(tidyverse)
bind_rows(mylist, .id = "id") %>%
group_by(id) %>%
nest() %>%
mutate(pval = map_dbl(data, ~t.test(
.x$Sepal.Length[.x$Group == "High"],
.x$Sepal.Length[.x$Group == "Low"])$p.value))
## A tibble: 3 x 3
# id data pval
# <chr> <list> <dbl>
#1 1 <tibble [50 × 6]> 0.275
#2 2 <tibble [50 × 6]> 0.548
#3 3 <tibble [50 × 6]> 0.864
To perform t-tests of Sepal.Length
between Group = "Low"
and Group = "High"
within Species
you can do
lapply(mylist, function(x)
with(x, setNames(sapply(unique(Species), function(y)
t.test(Sepal.Length[Group == "High" & Species == y], Sepal.Length[Group == "Low" & Species == y])$p.value), unique(Species))))
#[[1]]
#versicolor virginica setosa
#0.80669755 0.07765262 0.47224383
#
#[[2]]
# setosa virginica versicolor
# 0.6620094 0.2859713 0.2427945
#
#[[3]]
#versicolor setosa virginica
# 0.5326379 0.6412661 0.5477179
Keep in mind that you will have to adjust raw p-values for multiple hypothesis testing.
To account for multiple hypothesis testing, you could modify above code slightly to give
lapply(mylist, function(x)
with(x, p.adjust(setNames(sapply(unique(Species), function(y)
t.test(Sepal.Length[Group == "High" & Species == y], Sepal.Length[Group == "Low" & Species == y])$p.value), unique(Species)))))
#[[1]]
#versicolor virginica setosa
# 0.9444877 0.2329579 0.9444877
#
#[[2]]
# setosa virginica versicolor
# 0.7283836 0.7283836 0.7283836
#
#[[3]]
#versicolor setosa virginica
# 1 1 1
Here we use p.adjust
with the default Holm correction.
Upvotes: 1