esm
esm

Reputation: 105

Apply a function to all columns with one columns being repeatedly used using map

Here is my data

as_tibble(data)
# A tibble: 40 x 4
   Trt        V1      V2      V3
   <fct>   <dbl>   <dbl>   <dbl>
 1 d1    0.0105  0.00940 0.0174 
 2 d1    0.0199  0.00897 0.00279
 3 d1    0.00836 0.0104  0.00816
 4 d1    0.00960 0.0131  0.00404
 5 d1    0.00527 0.0123  0.00863
 6 d1    0.0136  0.0115  0.0130 
 7 d1    0.0216  0.00591 0.0106 
 8 d1    0.00558 0.00890 0.00964
 9 d2    0.0193  0.0116  0.0199 
10 d2    0.0172  0.0165  0.0582 
# ... with 30 more rows

where I want to perform aov using V* and Trt then do other statistics all given in a function f2

f2 <- function(y, Trt){

  dt1 <- aov(y ~ Trt) %>%
    emmeans(specs = "Trt")

  dt2 <- coef(pairs(dt1)) %>%
    select(2:5)

  d3 <- contrast(dt1, dt2, adjust = "Dunnett") %>%
    summary %>%
    pull(p.value)

 return(d3)
}

I get desired results when I run one column V* at a time against Trt

f2(data$V1, data$Trt)
[1] 5.450331e-01 5.936861e-01 2.302477e-02 7.882583e-15

f2(data$V2, data$Trt)
[1] 5.217088e-01 1.722111e-01 4.030167e-05 4.439782e-13

I want to apply f2to all columns starting with V*. This code gave an error

map2_dfr(data %>% select_if(is.double), data$Trt, f2)
Error: Mapped vectors must have consistent lengths:
* `.x` has length 3
* `.y` has length 40

I dont know why map2_dfr cant pick one column at a time. Any help?

Upvotes: 0

Views: 88

Answers (1)

agila
agila

Reputation: 3492

I would do something like this. First of all I load some packages and create some random data that has the same structure as yours.

library(dplyr, warn.conflicts = FALSE)
library(tidyr)
library(purrr)
library(emmeans)

data <- tibble::tibble(
  Trt = factor(rep(c("A", "B", "C", "D", "E"), each = 8)), 
  V1 = rnorm(40), 
  V2 = rnorm(40), 
  V3 = rnorm(40)
)

I slightly modified the definition of f2. It now accepts as input a dataframe and a character expression which represents the formula of the aov.

f2 <- function(data, aov_formula){

  dt1 <- aov(as.formula(aov_formula), data) %>%
    emmeans(specs = "Trt")

  dt2 <- coef(pairs(dt1)) %>%
    select(2:5)

  d3 <- contrast(dt1, dt2, adjust = "Dunnett") %>%
    summary %>%
    pull(p.value)

  d3
}

Now I "tidy" your data (with gather) as follows:

data <- data %>% 
  gather("index", "y", -Trt)
data
#> # A tibble: 120 x 3
#>    Trt   index       y
#>    <fct> <chr>   <dbl>
#>  1 A     V1     0.347 
#>  2 A     V1    -0.0837
#>  3 A     V1     0.389 
#>  4 A     V1     0.0358
#>  5 A     V1    -1.45  
#>  6 A     V1     0.0621
#>  7 A     V1     0.449 
#>  8 A     V1    -1.32  
#>  9 B     V1    -0.946 
#> 10 B     V1    -0.0518
#> # ... with 110 more rows

so that now I can use the nest/map approach to apply the function f2 to every V* variable.

data %>% 
  nest(-index) %>% 
  mutate(res = map(data, f2, aov_formula = "y ~ Trt")) %>% 
  unnest(res)
#> # A tibble: 12 x 2
#>    index   res
#>    <chr> <dbl>
#>  1 V1    0.996
#>  2 V1    0.986
#>  3 V1    0.781
#>  4 V1    0.721
#>  5 V2    1.000
#>  6 V2    0.798
#>  7 V2    0.965
#>  8 V2    1.000
#>  9 V3    0.949
#> 10 V3    0.551
#> 11 V3    0.546
#> 12 V3    0.670

Created on 2019-07-23 by the reprex package (v0.3.0)

If you don't like the shape of the resulting dataframe you could reshape it using gather and spread.

Upvotes: 2

Related Questions