I have a data frame that looks like this: cnpj time2 n_act_contracts 12 -1 10 12 0 8 12 1 6 13 -1 3 13 0 5 13 1 7 14 1 3 14 2 5 14 3 7 15 NA 3 15 NA 5 15 NA 7 I want to define another variable that takes, for all observations that have the same cnpj, the value of the n_act_contracts when the variable time2 is equal to zero. cnpj time2 n_act_contracts zero_n_act_contracts 12 -1 10 8 12 0 8 8 12 1 6 8 13 -1 3 5 13 0 5 5 13 1 7 5 14 1 3 NA 14 2 5 NA 14 3 7 NA 15 NA 3 NA 15 NA 5 NA 15 NA 7 NA I have beeing doing it with the following lines of code, but I need to make it more efficient. data <- data %>% group_by(cnpj) %>% mutate( zero_n_act_contracts = ifelse(time2 == 0,n_act_contracts,-1000), zero_n_act_contracts = max(zero_n_act_contracts, na.rm = TRUE), zero_n_act_contracts = ifelse(zero_n_act_contracts == -1000,NA,zero_n_act_contracts)) obs: I have already tryied replacing base "ifelse" by dplyr: "if_else", but my code took longer to run.

rdplyrtidyversedata-manipulationgrouped-table

Reputation: 1

Create a new variable for all observations within a group that equals another variable value conditional on a thirs variable value

I have a data frame that looks like this:

cnpj	time2	n_act_contracts
12	-1	10
12	0	8
12	1	6
13	-1	3
13	0	5
13	1	7
14	1	3
14	2	5
14	3	7
15	NA	3
15	NA	5
15	NA	7

I want to define another variable that takes, for all observations that have the same cnpj, the value of the n_act_contracts when the variable time2 is equal to zero.

cnpj	time2	n_act_contracts	zero_n_act_contracts
12	-1	10	8
12	0	8	8
12	1	6	8
13	-1	3	5
13	0	5	5
13	1	7	5
14	1	3	NA
14	2	5	NA
14	3	7	NA
15	NA	3	NA
15	NA	5	NA
15	NA	7	NA

I have beeing doing it with the following lines of code, but I need to make it more efficient.

data <- data %>%
group_by(cnpj) %>%
mutate(
zero_n_act_contracts = ifelse(time2 == 0,n_act_contracts,-1000),
zero_n_act_contracts = max(zero_n_act_contracts, na.rm = TRUE),
zero_n_act_contracts = ifelse(zero_n_act_contracts == -1000,NA,zero_n_act_contracts))

obs: I have already tryied replacing base "ifelse" by dplyr: "if_else", but my code took longer to run.

Upvotes: 0

Answers (2)

ThomasIsCoding

Reputation: 102625

A data.table option

setDT(df)[,zero_n_act_contracts := n_act_contracts[!time2],cnpj]

gives

> df
    cnpj time2 n_act_contracts zero_n_act_contracts
 1:   12    -1              10                    8
 2:   12     0               8                    8
 3:   12     1               6                    8
 4:   13    -1               3                    5
 5:   13     0               5                    5
 6:   13     1               7                    5
 7:   14     1               3                   NA
 8:   14     2               5                   NA
 9:   14     3               7                   NA
10:   15    NA               3                   NA
11:   15    NA               5                   NA
12:   15    NA               7                   NA

Upvotes: 0

akrun

Reputation: 887851

We can use

library(dplyr)
data %>% 
  group_by(cnpj) %>% 
  mutate(zero_n_act_contracts = n_act_contracts[time2 == 0][1]) %>%
  ungroup

-output

# A tibble: 12 x 4
#    cnpj time2 n_act_contracts zero_n_act_contracts
#   <int> <int>           <int>                <int>
# 1    12    -1              10                    8
# 2    12     0               8                    8
# 3    12     1               6                    8
# 4    13    -1               3                    5
# 5    13     0               5                    5
# 6    13     1               7                    5
# 7    14     1               3                   NA
# 8    14     2               5                   NA
# 9    14     3               7                   NA
#10    15    NA               3                   NA
#11    15    NA               5                   NA
#12    15    NA               7                   NA

data

df1 <- structure(list(cnpj = c(12L, 12L, 12L, 13L, 13L, 13L, 14L, 14L, 
14L, 15L, 15L, 15L), time2 = c(-1L, 0L, 1L, -1L, 0L, 1L, 1L, 
2L, 3L, NA, NA, NA), n_act_contracts = c(10L, 8L, 6L, 3L, 5L, 
7L, 3L, 5L, 7L, 3L, 5L, 7L)), class = "data.frame", row.names = c(NA, 
-12L))

Upvotes: 0

cnpj	time2	n_act_contracts	zero_n_act_contracts
12	-1	10	8
12	0	8	8
12	1	6	8
13	-1	3	5
13	0	5	5
13	1	7	5
14	1	3	NA
14	2	5	NA
14	3	7	NA
15	NA	3	NA
15	NA	5	NA
15	NA	7	NA

cnpj	time2	n_act_contracts	zero_n_act_contracts
12	-1	10	8
12	0	8	8
12	1	6	8
13	-1	3	5
13	0	5	5
13	1	7	5
14	1	3	NA
14	2	5	NA
14	3	7	NA
15	NA	3	NA
15	NA	5	NA
15	NA	7	NA

Create a new variable for all observations within a group that equals another variable value conditional on a thirs variable value

Answers (2)

data

Related Questions

cnpj	time2	n_act_contracts	zero_n_act_contracts
12	-1	10	8
12	0	8	8
12	1	6	8
13	-1	3	5
13	0	5	5
13	1	7	5
14	1	3	NA
14	2	5	NA
14	3	7	NA
15	NA	3	NA
15	NA	5	NA
15	NA	7	NA