Reputation: 193
I'm having so far this df: (not column result
):
df <- data.frame(number = c(1,1,1,1,2,2,2,2,3,3,3,3),
value1 = c(5,7,6,9,3,5,6,3,4,5,5,6),
group = c("control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3","control", "Treated1", "Treated2", "Treated3"),
result = c(1,1.4,1.2,1.8,1.0,1.67,2,1,1,1.25,1,1.2))
number value1 group result
1 1 5 control 1.00
2 1 7 Treated1 1.40
3 1 6 Treated2 1.20
4 1 9 Treated3 1.80
5 2 3 control 1.00
6 2 5 Treated1 1.67
7 2 6 Treated2 2.00
8 2 3 Treated3 1.00
9 3 4 control 1.00
10 3 5 Treated1 1.25
11 3 5 Treated2 1.00
12 3 6 Treated3 1.20
I want to group the data by number and also by group and then divide each subgroup of group
with the control
of the same number
group, but I'm struggling to archieve this.
e.g.
Line1: 5/5 = 1.0
Line2: 7/5 = 1.40
Line3: 6/5 = 1.20
Line4: 9/5 = 1.80
Line5: 3/3 = 1.0
I tried to do something like that (which does not work obviously):
library(dplyr)
df <- df %>%
group_by(number) %>%
mutate(result = value1[group == contains("Treated")] / value1[group == control)
Do you have any ideas?
Upvotes: 3
Views: 225
Reputation: 17069
If your expectation is there should always be just one "control"
per group, consider indexing by [[which(group == "control")]]
rather than [group == "control"]
. This is less succinct and likely a bit slower than @benson23's solution. But if "control"
appears more than once in a group, [[
will alert you by throwing an error.
For instance, say you forget to group your data by number
. [[
appropriately throws an error:
library(dplyr)
df %>% mutate(result = value1/value1[[which(group == "control")]])
# Error in `mutate()`:
# ℹ In argument: `result = value1/value1[[which(group == "control")]]`.
# Caused by error in `value1[[which(group == "control")]]`:
# ! attempt to select more than one element in vectorIndex
whereas [
silently returns incorrect output:
df %>% mutate(result = value1/value1[group == "control"])
# number value1 group result
# 1 1 5 control 1.000000
# 2 1 7 Treated1 2.333333
# 3 1 6 Treated2 1.500000
# 4 1 9 Treated3 1.800000
# 5 2 3 control 1.000000
# 6 2 5 Treated1 1.250000
# 7 2 6 Treated2 1.200000
# 8 2 3 Treated3 1.000000
# 9 3 4 control 1.000000
# 10 3 5 Treated1 1.000000
# 11 3 5 Treated2 1.666667
# 12 3 6 Treated3 1.500000
This illustrates why it's often a good idea to use [[
when exactly one value is expected.
Upvotes: 1
Reputation: 19097
You can index value1
which has group == "control"
, and divide all other value1
with this value.
library(dplyr)
df %>% group_by(number) %>% mutate(result = value1/value1[group == "control"])
Or you can arrange
the group
column, so that "control" will always be the first
value.
df %>% group_by(number) %>%
arrange(number, group) %>%
mutate(result = value1/first(value1))
# A tibble: 12 × 4
# Groups: number [3]
number value1 group result
<dbl> <dbl> <chr> <dbl>
1 1 5 control 1
2 1 7 Treated1 1.4
3 1 6 Treated2 1.2
4 1 9 Treated3 1.8
5 2 3 control 1
6 2 5 Treated1 1.67
7 2 6 Treated2 2
8 2 3 Treated3 1
9 3 4 control 1
10 3 5 Treated1 1.25
11 3 5 Treated2 1.25
12 3 6 Treated3 1.5
Upvotes: 1