Reputation: 75
I'm trying add a new variable that is based on the observation for one level of a factor within a groups in my dataset. I've been trying to utilize various dplyr functions (filter
, select
,mutate
,group_by
) but can't figure out how to get them to work together and accomplish my goal.
here is a sample of my data:
rep rate n mort avg
<fct> <fct> <int> <dbl> <dbl>
1 1 0.747 10 7 0.7
2 1 0.373 10 7 0.7
3 1 0.187 10 6 0.6
4 1 0.0933 10 0 0
5 1 0.00 10 1 0.1
6 2 0.747 10 7 0.7
7 2 0.373 10 5 0.5
8 2 0.187 10 1 0.1
9 2 0.0933 10 4 0.4
10 2 0.00 10 0 0
What I'm hoping to accomplish is to create a new variable called cont
that is derived from the avg
variable when rate == "0.00"
. This variable would be the same for each observation within the same rep
group. The final product would be a table similar to the one below:
rep rate n mort avg cont
<fct> <fct> <int> <dbl> <dbl> <dbl>
1 1 0.747 10 7 0.7 0.1
2 1 0.373 10 7 0.7 0.1
3 1 0.187 10 6 0.6 0.1
4 1 0.0933 10 0 0 0.1
5 1 0.00 10 1 0.1 0.1
6 2 0.747 10 7 0.7 0
7 2 0.373 10 5 0.5 0
8 2 0.187 10 1 0.1 0
9 2 0.0933 10 4 0.4 0
10 2 0.00 10 0 0 0
I've tried the following code: data %>% group_by(rep) %>% filter(rate =="0.00") %>% select(avg)
which results in a dataframe with the data that I do want added as the new variable:
rep avg
<fct> <dbl>
1 1 0.1
2 2 0
3 3 0.1
4 4 0.3
5 5 0
6 6 0
7 7 0
8 8 0
My problem now is that I have no idea how to create the new variable for each observation within the rep
group. I'm not sure how to use mutate
properly in this situation. Thank you in advance for any help!
Upvotes: 2
Views: 791
Reputation: 886938
We can use match
library(dplyr)
df %>%
group_by(rep) %>%
mutate(cont = avg[match("0.00", rate)])
# A tibble: 10 x 6
# Groups: rep [2]
# rep rate n mort avg cont
# <fct> <fct> <int> <dbl> <dbl> <dbl>
# 1 1 0.747 10 7 0.7 0.1
# 2 1 0.373 10 7 0.7 0.1
# 3 1 0.187 10 6 0.6 0.1
# 4 1 0.0933 10 0 0 0.1
# 5 1 0.00 10 1 0.1 0.1
# 6 2 0.747 10 7 0.7 0
# 7 2 0.373 10 5 0.5 0
# 8 2 0.187 10 1 0.1 0
# 9 2 0.0933 10 4 0.4 0
#10 2 0.00 10 0 0 0
Or with data.table
library(data.table)
setDT(df)[, cont := avg[match("0.00", rate)], rep]
Or using the join as @thelatemail suggested
setDT(df)[df[rate=="0.00"], on= .(rep), cont := i.avg]
Note; Both the methods would work even if there are duplicate values as match
returns only the index of the first match.
df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L,
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933",
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0,
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1,
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), class = "data.frame")
Upvotes: 1
Reputation: 388817
Assuming there would be only one occurrence of rate == "0.00"
in each group, we can do
library(dplyr)
df %>%
group_by(rep) %>%
mutate(cont = avg[rate == "0.00"])
# rep rate n mort avg cont
# <fct> <fct> <int> <dbl> <dbl> <dbl>
# 1 1 0.747 10 7 0.7 0.1
# 2 1 0.373 10 7 0.7 0.1
# 3 1 0.187 10 6 0.6 0.1
# 4 1 0.0933 10 0 0 0.1
# 5 1 0.00 10 1 0.1 0.1
# 6 2 0.747 10 7 0.7 0
# 7 2 0.373 10 5 0.5 0
# 8 2 0.187 10 1 0.1 0
# 9 2 0.0933 10 4 0.4 0
#10 2 0.00 10 0 0 0
If there are more than one occurrence, we can use which.max
to select the first one
df %>% group_by(rep) %>% mutate(cont = avg[which.max(rate == "0.00")])
Using data.table
, we can do
library(data.table)
setDT(df)[, cont := avg[rate == "0.00"], by = rep]
data
df <- structure(list(rep = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), .Label = c("1", "2"), class = "factor"), rate = structure(c(5L,
4L, 3L, 2L, 1L, 5L, 4L, 3L, 2L, 1L), .Label = c("0.00", "0.0933",
"0.187", "0.373", "0.747"), class = "factor"), n = c(10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), mort = c(7, 7, 6, 0,
1, 7, 5, 1, 4, 0), avg = c(0.7, 0.7, 0.6, 0, 0.1, 0.7, 0.5, 0.1,
0.4, 0)), row.names = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10"), class = "data.frame")
Upvotes: 4