Reputation: 271
I am having trouble trying to apply a custom function to multiple groups within a data frame and mutate it to the original data. I am trying to calculate the percent inhibition for each row of data (each observation in the experiment has a value). The challenging issue is that the function needs the mean of two different groups of values (positive and negative controls) and then uses that mean value in each calculation.
In other words, the mean of the negative control is subtracted by the experimental value, then divided by the mean of the negative control minus the positive control.
Each observation including the + and - controls should have a calculated percent inhibition, and as a double check, for each experiment(grouping) the mean of the pct inhib of the - controls should be around 0 and the + controls around 100.
The function:
percent_inhibition <- function(uninhibited, inhibited, unknown){
uninhibited <- as.vector(uninhibited)
inhibited <- as.vector(inhibited)
unknown <- as.vector(unknown)
mu_u <- mean(uninhibited, na.rm = TRUE)
mu_i <- mean(inhibited, na.rm = TRUE)
percent_inhibition <- (mu_u - unknown)/(mu_u - mu_i)*100
return(percent_inhibition)
}
I have a data frame with multiple variables: target, box, replicate, and sample type. I am able to do the calculation by subsetting the data (below), (1 target, box, and replicate) but have not been able to figure out the right way to apply it to all of the data.
subset <- data %>%
filter(target == "A", box == "1", replicate == 1)
uninhib <-
subset$value[subset$sample == "unihib"]
inhib <-
subset$value[subset$sample == "inhib"]
pct <- subset %>%
mutate(pct = percent_inhibition(uninhib, inhib, .$value))
I have tried group_by and do, and nest functions, but my knowledge is lacking in how to apply these functions to my subsetting problem. I'm stuck when it comes to the subset of the subset (calculating the means) and then applying that to the individual values. I am hoping there is an elegant way to do this without all of the subsetting, but I am at a loss on how.
I have tried:
inhibition <- data %>%
group_by(target, box, replicate) %>%
mutate(pct = (percent_inhibition(.$value[.$sample == "uninhib"], .$value[.$sample == "inhib"], .$value)))
But get the error that columns are not the right length, because of the group_by function.
Upvotes: 1
Views: 257
Reputation: 12819
library(tidyr)
library(purrr)
library(dplyr)
data %>%
group_by(target, box, replicate) %>%
mutate(pct = {
x <- split(value, sample)
percent_inhibition(x$uninhib, x$inhib, value)
})
#> # A tibble: 10,000 x 6
#> # Groups: target, box, replicate [27]
#> target box replicate sample value pct
#> <chr> <chr> <int> <chr> <dbl> <dbl>
#> 1 A 1 3 inhib -0.836 1941.
#> 2 C 1 1 uninhib -0.221 -281.
#> 3 B 3 2 inhib -2.10 1547.
#> 4 C 1 1 uninhib -1.67 -3081.
#> 5 C 1 3 inhib -1.10 -1017.
#> 6 A 2 1 inhib -1.67 906.
#> 7 B 3 1 uninhib -0.0495 -57.3
#> 8 C 3 2 inhib 1.56 5469.
#> 9 B 3 2 uninhib -0.405 321.
#> 10 B 1 2 inhib 0.786 -3471.
#> # … with 9,990 more rows
Created on 2019-03-25 by the reprex package (v0.2.1)
Or:
data %>%
group_by(target, box, replicate) %>%
mutate(pct = percent_inhibition(value[sample == "uninhib"],
value[sample == "inhib"], value))
With data
as:
n <- 10000L
set.seed(123) ; data <-
tibble(
target = sample(LETTERS[1:3], n, replace = TRUE),
box = sample(as.character(1:3), n, replace = TRUE),
replicate = sample(1:3, n, replace = TRUE),
sample = sample(c("inhib", "uninhib"), n, replace = TRUE),
value = rnorm(n)
)
Upvotes: 1