Compute variable according to factor levels

Question

I am kind of new to R and programming in general. I am currently strugling with a piece of code for data transformation and hope someone can take a little bit of time to help me.

Below a reproducible exemple :

#    Data
a <- c(rnorm(12, 20))
b <- c(rnorm(12, 25))
f1 <- rep(c("X","Y","Z"), each=4) #family
f2 <- rep(x = c(0,1,50,100), 3) #reference and test levels

dt <- data.frame(f1=factor(f1), f2=factor(f2), a,b)

#library loading
library(tidyverse)

Goal : Compute all values (a,b) using a reference value. Calculation should be : a/a_ref with a_ref = a when f2=0 depending on the family (f1 can be X,Y or Z).

I tried to solve this by using this code :

    test <- filter(dt, f2!=0) %>% group_by(f1) %>%
    mutate("a/a_ref"=a/(filter(dt, f2==0) %>% group_by(f1) %>% distinct(a) %>% pull))

I get :

test results

as you can see a is divided by a_ref. But my script seems to recycle the use of reference values (a_ref) regardless of the family f1.

Do you have any suggestion so A is computed with regard of the family (f1) ?

Thank you for reading !

EDIT

I found a way to do it 'manualy'

   filter(dt, f1=="X") %>% mutate("a/a_ref"=a/(filter(dt, f1=="X" & f2==0) %>% distinct(a) %>% pull()))
      f1  f2        a        b         a/a_ref
    1  X   0 21.77605 24.53115 1.0000000
    2  X   1 20.17327 24.02512 0.9263973
    3  X  50 19.81482 25.58103 0.9099366
    4  X 100 19.90205 24.66322 0.9139422

the problem is that I'd have to update the code for each variable and family and thus is not a clean way to do it.

AntoniosK · Accepted Answer

# use this to reproduce the same dataset and results
set.seed(5)

# Data
a <- c(rnorm(12, 20))
b <- c(rnorm(12, 25))
f1 <- rep(c("X","Y","Z"), each=4) #family
f2 <- rep(x = c(0,1,50,100), 3) #reference and test levels

dt <- data.frame(f1=factor(f1), f2=factor(f2), a,b)

#library loading
library(tidyverse)

dt %>%
  group_by(f1) %>%                 # for each f1 value
  mutate(a_ref = a[f2 == 0],       # get the a_ref and add it in each row
         "a/a_ref" = a/a_ref) %>%  # divide a and a_ref
  ungroup() %>%                    # forget the grouping
  filter(f2 != 0)                  # remove rows where f2 == 0

# # A tibble: 9 x 6
#       f1     f2        a        b    a_ref `a/a_ref`
#                     
# 1      X      1 21.38436 24.84247 19.15914 1.1161437
# 2      X     50 18.74451 23.92824 19.15914 0.9783583
# 3      X    100 20.07014 24.86101 19.15914 1.0475490
# 4      Y      1 19.39709 22.81603 21.71144 0.8934042
# 5      Y     50 19.52783 25.24082 21.71144 0.8994260
# 6      Y    100 19.36463 24.74064 21.71144 0.8919090
# 7      Z      1 20.13811 25.94187 19.71423 1.0215013
# 8      Z     50 21.22763 26.46796 19.71423 1.0767671
# 9      Z    100 19.19822 25.70676 19.71423 0.9738257

You can do this for more than one variable using:

dt %>% 
  group_by(f1) %>% 
  mutate_at(vars(a:b), funs(./.[f2 == 0])) %>% 
  ungroup()

Or generally use vars(a:z) to use all variables between a and z as long as they are one after the other in your dataset.

Another solution could be using mutate_if like:

dt %>% 
  group_by(f1) %>% 
  mutate_if(is.numeric, funs(./.[f2 == 0])) %>% 
  ungroup()

Where the function will be applied to all numeric variables you have. The variables f1 and f2 will be factor variables, so it just excludes those ones.

Compute variable according to factor levels

Answers (1)

Related Questions