Reputation: 27
I'm attempting to use dplyr to analyze experiment data. My current data set represents five patients. For each patient, two samples are non-treated and there are four treated samples. I want to average the non-treated samples and then normalize all the observations for each patient to the average of the non-treated samples.
I'm easily able to get the baseline for each patient:
library(dplyr)
library(magrittr)
baselines <-main_table %>%
filter(Treatment == "N/A") %>%
group_by(PATIENT.ID) %>%
summarize(mean_CD4 = mean(CD3pos.CD8neg))
What is an efficient way to reference these values when I go back to mutate in the main table? Ideally being able to use PATIENT.ID
to filter/select somehow rather than having to specify the actual patient IDs, which change from one experiment to the next?
What I've been doing is saving the values out of the summarized table and then using those inside mutate
, but this solution is UGLY. I really do not like having the patient IDs hard coded in like this because they change from experiment to experiment and manually changing them introduces errors that are hard to catch.
patient_1_baseline <- baselines[[1, 2]]
patient_2_baseline <- baselines[[2, 2]]
main_table %>%
mutate(percent_of_baseline = ifelse(
PATIENT.ID == "108", CD3pos.CD8neg / patient_1_basline * 100,
ifelse(PATIENT.ID == "patient_2", ......
Another way to approach this would be to try to group by patient ID, summarize
to get the baseline, and then mutate
, but I cannot quite figure out how to do that either.
This is ultimately a symptom of a larger problem. I have the tidyverse
basics down ok but I am struggling to move to the next level where I can handle more complex situations like this one. Any advice about this specific scenario or the big picture problem are deeply appreciated.
Edited to add: Sample data set
PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg
1 108 Day 1 42570 24324
2 108 Day 2 36026 20842
3 108 Day 3 40449 22882
4 108 Day 4 52831 32034
5 108 N/A 71348 38340
6 108 N/A 60113 34294
Upvotes: 0
Views: 623
Reputation: 11908
Use left_join()
to merge the baselines that you calculated back into the main_table
:
main_table %>%
left_join(baselines, by = "PATIENT.ID")
See e.g. here and here for more about merging data in R.
mutate()
:
library(tidyverse)
main_table %>%
group_by(PATIENT.ID) %>%
mutate(baseline = mean(CD3pos.CD8neg[Dose.Day == "N/A"])) %>%
mutate(pctbl = CD3pos.CD8neg / baseline * 100)
#> # A tibble: 6 x 6
#> # Groups: PATIENT.ID [1]
#> PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg baseline pctbl
#> <int> <chr> <int> <int> <dbl> <dbl>
#> 1 108 Day1 42570 24324 36317 67.0
#> 2 108 Day2 36026 20842 36317 57.4
#> 3 108 Day3 40449 22882 36317 63.0
#> 4 108 Day4 52831 32034 36317 88.2
#> 5 108 N/A 71348 38340 36317 106.
#> 6 108 N/A 60113 34294 36317 94.4
Data:
txt <- "
PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg
1 108 Day1 42570 24324
2 108 Day2 36026 20842
3 108 Day3 40449 22882
4 108 Day4 52831 32034
5 108 N/A 71348 38340
6 108 N/A 60113 34294"
main_table <- read.table(text = txt, header = TRUE,
stringsAsFactors = FALSE)
Created on 2018-07-11 by the reprex package (v0.2.0.9000).
Upvotes: 1