kin182
kin182

Reputation: 403

How to extract data from a list with multiple groups?

I am new to R and I have a question about extracting data from a list with multiple groups. For example, I have a set of data like this:

data(iris)

iris$Group = rep(c("High","Low", each=5))
iris = iris[sample(nrow(iris)),]
mylist = list(iris[1:50,], iris[51:100,], iris[101:150,])

head(mylist)[[1]]

    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species Group
51           7.0         3.2          4.7         1.4 versicolor  High
123          7.7         2.8          6.7         2.0  virginica  High
147          6.3         2.5          5.0         1.9  virginica   Low
23           4.6         3.6          1.0         0.2     setosa  High
120          6.0         2.2          5.0         1.5  virginica   Low
141          6.7         3.1          5.6         2.4  virginica  High

Within each list, I would like to group by Species and calculate the P value by t.test of Sepal.Length between Group High and Low. For example, I would like to get the P value of between Group High and Low of Species virginica, and so on for each list.

I am confused about this. Could anyone help? Thanks!

Upvotes: 0

Views: 181

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50678

In base R you can do the following

lapply(mylist, function(x)
    with(x, t.test(Sepal.Length[Group == "High"], Sepal.Length[Group == "Low"])$p.value))
#[[1]]
#[1] 0.2751545
#
#[[2]]
#[1] 0.5480918
#
#[[3]]
#[1] 0.864256

Or a purrr/tidyverse approach

library(tidyverse)
bind_rows(mylist, .id = "id") %>%
    group_by(id) %>%
    nest() %>%
    mutate(pval = map_dbl(data, ~t.test(
        .x$Sepal.Length[.x$Group == "High"],
        .x$Sepal.Length[.x$Group == "Low"])$p.value))
## A tibble: 3 x 3
#  id    data               pval
#  <chr> <list>            <dbl>
#1 1     <tibble [50 × 6]> 0.275
#2 2     <tibble [50 × 6]> 0.548
#3 3     <tibble [50 × 6]> 0.864

Update

To perform t-tests of Sepal.Length between Group = "Low" and Group = "High" within Species you can do

lapply(mylist, function(x)
    with(x, setNames(sapply(unique(Species), function(y)
        t.test(Sepal.Length[Group == "High" & Species == y], Sepal.Length[Group == "Low" & Species == y])$p.value), unique(Species))))
#[[1]]
#versicolor  virginica     setosa
#0.80669755 0.07765262 0.47224383
#
#[[2]]
#    setosa  virginica versicolor
# 0.6620094  0.2859713  0.2427945
#
#[[3]]
#versicolor     setosa  virginica
# 0.5326379  0.6412661  0.5477179

Keep in mind that you will have to adjust raw p-values for multiple hypothesis testing.

To account for multiple hypothesis testing, you could modify above code slightly to give

lapply(mylist, function(x)
    with(x, p.adjust(setNames(sapply(unique(Species), function(y)
        t.test(Sepal.Length[Group == "High" & Species == y], Sepal.Length[Group == "Low" & Species == y])$p.value), unique(Species)))))
#[[1]]
#versicolor  virginica     setosa
# 0.9444877  0.2329579  0.9444877
#
#[[2]]
#    setosa  virginica versicolor
# 0.7283836  0.7283836  0.7283836
#
#[[3]]
#versicolor     setosa  virginica
#         1          1          1

Here we use p.adjust with the default Holm correction.

Upvotes: 1

Related Questions