L_Iguana
L_Iguana

Reputation: 27

Use summarized value to normalize in Tidyverse pipeline

I'm attempting to use dplyr to analyze experiment data. My current data set represents five patients. For each patient, two samples are non-treated and there are four treated samples. I want to average the non-treated samples and then normalize all the observations for each patient to the average of the non-treated samples.

I'm easily able to get the baseline for each patient:

library(dplyr)
library(magrittr) 
   baselines <-main_table %>%
        filter(Treatment == "N/A") %>%
        group_by(PATIENT.ID) %>%
        summarize(mean_CD4 = mean(CD3pos.CD8neg))

What is an efficient way to reference these values when I go back to mutate in the main table? Ideally being able to use PATIENT.ID to filter/select somehow rather than having to specify the actual patient IDs, which change from one experiment to the next?

What I've been doing is saving the values out of the summarized table and then using those inside mutate, but this solution is UGLY. I really do not like having the patient IDs hard coded in like this because they change from experiment to experiment and manually changing them introduces errors that are hard to catch.

patient_1_baseline <- baselines[[1, 2]]
patient_2_baseline <- baselines[[2, 2]]

main_table %>%
    mutate(percent_of_baseline = ifelse(
        PATIENT.ID == "108", CD3pos.CD8neg / patient_1_basline * 100,
        ifelse(PATIENT.ID == "patient_2", ......

Another way to approach this would be to try to group by patient ID, summarize to get the baseline, and then mutate, but I cannot quite figure out how to do that either.

This is ultimately a symptom of a larger problem. I have the tidyverse basics down ok but I am struggling to move to the next level where I can handle more complex situations like this one. Any advice about this specific scenario or the big picture problem are deeply appreciated.

Edited to add: Sample data set

PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg
1      108    Day 1              42570         24324
2      108    Day 2              36026         20842
3      108    Day 3              40449         22882
4      108    Day 4              52831         32034
5      108      N/A              71348         38340
6      108      N/A              60113         34294    

Upvotes: 0

Views: 623

Answers (1)

Mikko Marttila
Mikko Marttila

Reputation: 11908

Use left_join() to merge the baselines that you calculated back into the main_table:

main_table %>% 
  left_join(baselines, by = "PATIENT.ID")

See e.g. here and here for more about merging data in R.


Another approach in this case could also avoid the need for a separate baseline dataset entirely by just adding the baseline with a grouped mutate():

library(tidyverse)

main_table %>% 
  group_by(PATIENT.ID) %>% 
  mutate(baseline = mean(CD3pos.CD8neg[Dose.Day == "N/A"])) %>% 
  mutate(pctbl = CD3pos.CD8neg / baseline * 100)
#> # A tibble: 6 x 6
#> # Groups:   PATIENT.ID [1]
#>   PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg baseline pctbl
#>        <int> <chr>                 <int>         <int>    <dbl> <dbl>
#> 1        108 Day1                  42570         24324    36317  67.0
#> 2        108 Day2                  36026         20842    36317  57.4
#> 3        108 Day3                  40449         22882    36317  63.0
#> 4        108 Day4                  52831         32034    36317  88.2
#> 5        108 N/A                   71348         38340    36317 106. 
#> 6        108 N/A                   60113         34294    36317  94.4

Data:

txt <- "
PATIENT.ID Dose.Day Single.Live.Lymphs CD3pos.CD8neg
1      108     Day1              42570         24324
2      108     Day2              36026         20842
3      108     Day3              40449         22882
4      108     Day4              52831         32034
5      108      N/A              71348         38340
6      108      N/A              60113         34294"

main_table <- read.table(text = txt, header = TRUE,
                         stringsAsFactors = FALSE)

Created on 2018-07-11 by the reprex package (v0.2.0.9000).

Upvotes: 1

Related Questions