basti41a
basti41a

Reputation: 193

Divide different groups by reference group

I'm having so far this df: (not column result):

df <- data.frame(number = c(1,1,1,1,2,2,2,2,3,3,3,3),
                 value1 = c(5,7,6,9,3,5,6,3,4,5,5,6),
                 group = c("control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3"),
                 result = c(1,1.4,1.2,1.8,1.0,1.67,2,1,1,1.25,1,1.2))

   number value1    group result
1       1      5  control   1.00
2       1      7 Treated1   1.40
3       1      6 Treated2   1.20
4       1      9 Treated3   1.80
5       2      3  control   1.00
6       2      5 Treated1   1.67
7       2      6 Treated2   2.00
8       2      3 Treated3   1.00
9       3      4  control   1.00
10      3      5 Treated1   1.25
11      3      5 Treated2   1.00
12      3      6 Treated3   1.20

I want to group the data by number and also by group and then divide each subgroup of group with the control of the same numbergroup, but I'm struggling to archieve this. e.g.

Line1: 5/5 = 1.0
Line2: 7/5 = 1.40
Line3: 6/5 = 1.20
Line4: 9/5 = 1.80
Line5: 3/3 = 1.0

I tried to do something like that (which does not work obviously):

library(dplyr)
df <- df %>%
   group_by(number) %>%
   mutate(result = value1[group == contains("Treated")] / value1[group == control)

Do you have any ideas?

Upvotes: 3

Views: 225

Answers (2)

zephryl
zephryl

Reputation: 17069

If your expectation is there should always be just one "control" per group, consider indexing by [[which(group == "control")]] rather than [group == "control"]. This is less succinct and likely a bit slower than @benson23's solution. But if "control" appears more than once in a group, [[ will alert you by throwing an error.

For instance, say you forget to group your data by number. [[ appropriately throws an error:

library(dplyr)

df %>% mutate(result = value1/value1[[which(group == "control")]])
# Error in `mutate()`:
#   ℹ In argument: `result = value1/value1[[which(group == "control")]]`.
# Caused by error in `value1[[which(group == "control")]]`:
#   ! attempt to select more than one element in vectorIndex

whereas [ silently returns incorrect output:

df %>% mutate(result = value1/value1[group == "control"])
#    number value1    group   result
# 1       1      5  control 1.000000
# 2       1      7 Treated1 2.333333
# 3       1      6 Treated2 1.500000
# 4       1      9 Treated3 1.800000
# 5       2      3  control 1.000000
# 6       2      5 Treated1 1.250000
# 7       2      6 Treated2 1.200000
# 8       2      3 Treated3 1.000000
# 9       3      4  control 1.000000
# 10      3      5 Treated1 1.000000
# 11      3      5 Treated2 1.666667
# 12      3      6 Treated3 1.500000

This illustrates why it's often a good idea to use [[ when exactly one value is expected.

Upvotes: 1

benson23
benson23

Reputation: 19097

You can index value1 which has group == "control", and divide all other value1 with this value.

library(dplyr)

df %>% group_by(number) %>% mutate(result = value1/value1[group == "control"])

Or you can arrange the group column, so that "control" will always be the first value.

df %>% group_by(number) %>% 
  arrange(number, group) %>% 
  mutate(result = value1/first(value1))

Output

# A tibble: 12 × 4
# Groups:   number [3]
   number value1 group    result
    <dbl>  <dbl> <chr>     <dbl>
 1      1      5 control    1   
 2      1      7 Treated1   1.4 
 3      1      6 Treated2   1.2 
 4      1      9 Treated3   1.8 
 5      2      3 control    1   
 6      2      5 Treated1   1.67
 7      2      6 Treated2   2   
 8      2      3 Treated3   1   
 9      3      4 control    1   
10      3      5 Treated1   1.25
11      3      5 Treated2   1.25
12      3      6 Treated3   1.5 

Upvotes: 1

Related Questions