ashcrashbegash
ashcrashbegash

Reputation: 271

Using custom function to apply across multiple groups and subsets

I am having trouble trying to apply a custom function to multiple groups within a data frame and mutate it to the original data. I am trying to calculate the percent inhibition for each row of data (each observation in the experiment has a value). The challenging issue is that the function needs the mean of two different groups of values (positive and negative controls) and then uses that mean value in each calculation.

In other words, the mean of the negative control is subtracted by the experimental value, then divided by the mean of the negative control minus the positive control.

Each observation including the + and - controls should have a calculated percent inhibition, and as a double check, for each experiment(grouping) the mean of the pct inhib of the - controls should be around 0 and the + controls around 100.

The function:

percent_inhibition <- function(uninhibited, inhibited, unknown){

  uninhibited <- as.vector(uninhibited)
  inhibited <- as.vector(inhibited)
  unknown <- as.vector(unknown)

  mu_u <- mean(uninhibited, na.rm = TRUE)
  mu_i <- mean(inhibited, na.rm = TRUE)    

  percent_inhibition <- (mu_u - unknown)/(mu_u - mu_i)*100
  return(percent_inhibition)
}

I have a data frame with multiple variables: target, box, replicate, and sample type. I am able to do the calculation by subsetting the data (below), (1 target, box, and replicate) but have not been able to figure out the right way to apply it to all of the data.

subset <- data %>% 
  filter(target == "A", box == "1", replicate == 1) 

uninhib <-  
  subset$value[subset$sample == "unihib"]

inhib <-
  subset$value[subset$sample == "inhib"]


pct <- subset %>% 
  mutate(pct = percent_inhibition(uninhib, inhib, .$value))

I have tried group_by and do, and nest functions, but my knowledge is lacking in how to apply these functions to my subsetting problem. I'm stuck when it comes to the subset of the subset (calculating the means) and then applying that to the individual values. I am hoping there is an elegant way to do this without all of the subsetting, but I am at a loss on how.

I have tried:

inhibition <- data %>%
  group_by(target, box, replicate) %>% 
  mutate(pct = (percent_inhibition(.$value[.$sample == "uninhib"], .$value[.$sample == "inhib"], .$value))) 

But get the error that columns are not the right length, because of the group_by function.

Upvotes: 1

Views: 257

Answers (1)

Aur&#232;le
Aur&#232;le

Reputation: 12819

library(tidyr)
library(purrr)
library(dplyr)

data %>%
  group_by(target, box, replicate) %>% 
  mutate(pct = {
    x <- split(value, sample)
    percent_inhibition(x$uninhib, x$inhib, value)
  }) 
#> # A tibble: 10,000 x 6
#> # Groups:   target, box, replicate [27]
#>    target box   replicate sample    value     pct
#>    <chr>  <chr>     <int> <chr>     <dbl>   <dbl>
#>  1 A      1             3 inhib   -0.836   1941. 
#>  2 C      1             1 uninhib -0.221   -281. 
#>  3 B      3             2 inhib   -2.10    1547. 
#>  4 C      1             1 uninhib -1.67   -3081. 
#>  5 C      1             3 inhib   -1.10   -1017. 
#>  6 A      2             1 inhib   -1.67     906. 
#>  7 B      3             1 uninhib -0.0495   -57.3
#>  8 C      3             2 inhib    1.56    5469. 
#>  9 B      3             2 uninhib -0.405    321. 
#> 10 B      1             2 inhib    0.786  -3471. 
#> # … with 9,990 more rows

Created on 2019-03-25 by the reprex package (v0.2.1)

Or:

data %>%
  group_by(target, box, replicate) %>% 
  mutate(pct = percent_inhibition(value[sample == "uninhib"], 
                                  value[sample == "inhib"], value))

With data as:

n <- 10000L
set.seed(123) ; data <- 
  tibble(
    target = sample(LETTERS[1:3], n, replace = TRUE),
    box = sample(as.character(1:3), n, replace = TRUE),
    replicate = sample(1:3, n, replace = TRUE),
    sample = sample(c("inhib", "uninhib"), n, replace = TRUE),
    value = rnorm(n)
  )

Upvotes: 1

Related Questions