Reputation: 1
I have a data frame that looks like this:
cnpj | time2 | n_act_contracts |
---|---|---|
12 | -1 | 10 |
12 | 0 | 8 |
12 | 1 | 6 |
13 | -1 | 3 |
13 | 0 | 5 |
13 | 1 | 7 |
14 | 1 | 3 |
14 | 2 | 5 |
14 | 3 | 7 |
15 | NA | 3 |
15 | NA | 5 |
15 | NA | 7 |
I want to define another variable that takes, for all observations that have the same cnpj, the value of the n_act_contracts when the variable time2 is equal to zero.
cnpj | time2 | n_act_contracts | zero_n_act_contracts |
---|---|---|---|
12 | -1 | 10 | 8 |
12 | 0 | 8 | 8 |
12 | 1 | 6 | 8 |
13 | -1 | 3 | 5 |
13 | 0 | 5 | 5 |
13 | 1 | 7 | 5 |
14 | 1 | 3 | NA |
14 | 2 | 5 | NA |
14 | 3 | 7 | NA |
15 | NA | 3 | NA |
15 | NA | 5 | NA |
15 | NA | 7 | NA |
I have beeing doing it with the following lines of code, but I need to make it more efficient.
data <- data %>%
group_by(cnpj) %>%
mutate(
zero_n_act_contracts = ifelse(time2 == 0,n_act_contracts,-1000),
zero_n_act_contracts = max(zero_n_act_contracts, na.rm = TRUE),
zero_n_act_contracts = ifelse(zero_n_act_contracts == -1000,NA,zero_n_act_contracts))
obs: I have already tryied replacing base "ifelse" by dplyr: "if_else", but my code took longer to run.
Upvotes: 0
Views: 41
Reputation: 101064
A data.table
option
setDT(df)[,zero_n_act_contracts := n_act_contracts[!time2],cnpj]
gives
> df
cnpj time2 n_act_contracts zero_n_act_contracts
1: 12 -1 10 8
2: 12 0 8 8
3: 12 1 6 8
4: 13 -1 3 5
5: 13 0 5 5
6: 13 1 7 5
7: 14 1 3 NA
8: 14 2 5 NA
9: 14 3 7 NA
10: 15 NA 3 NA
11: 15 NA 5 NA
12: 15 NA 7 NA
Upvotes: 0
Reputation: 886938
We can use
library(dplyr)
data %>%
group_by(cnpj) %>%
mutate(zero_n_act_contracts = n_act_contracts[time2 == 0][1]) %>%
ungroup
-output
# A tibble: 12 x 4
# cnpj time2 n_act_contracts zero_n_act_contracts
# <int> <int> <int> <int>
# 1 12 -1 10 8
# 2 12 0 8 8
# 3 12 1 6 8
# 4 13 -1 3 5
# 5 13 0 5 5
# 6 13 1 7 5
# 7 14 1 3 NA
# 8 14 2 5 NA
# 9 14 3 7 NA
#10 15 NA 3 NA
#11 15 NA 5 NA
#12 15 NA 7 NA
df1 <- structure(list(cnpj = c(12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L,
14L, 15L, 15L, 15L), time2 = c(-1L, 0L, 1L, -1L, 0L, 1L, 1L,
2L, 3L, NA, NA, NA), n_act_contracts = c(10L, 8L, 6L, 3L, 5L,
7L, 3L, 5L, 7L, 3L, 5L, 7L)), class = "data.frame", row.names = c(NA,
-12L))
Upvotes: 0