Jakob
Jakob

Reputation: 163

Insert rows in dataframe based on multiple conditions

I have large dataframe with many groups, one might look like this:

id       tnr     tnr_grp     tnr_a     tnr_b
150001   1       0           1         0
150001   2       0           2         0
150001   3       0           3         0
150001   4       1           4         0
150001   4-5     1           4         5
150001   6       0           6         0
150001   7       0           7         0
150001   19      0           19        0
150001   20      1           20        0
150001   20-21   1           20        21
150001   22      1           22        0
150001   22-23   1           22        23
150001   24      1           24        0
150001   25      0           25        0

I have itentified the overlapping numbers tnr_grp, 1 meaning overlapping, 0 not overlapping. I also split two part numbers into tnr_a and tnr_b. Next, I need to work with the occurences of 0 and 1 in tnr_grp. The problem I encountered is that I sometimes have two overlapping groups right after each other - but which are actually not the same group. As in the example 20 overlaps with 20-21, and 22 overlaps with 22-23. I want to insert a a new row in between these two groups, where the id is preserved, tnr_grp is 0, an all other values are NA. So the desired output should look like this:

id       tnr     tnr_grp     tnr_a     tnr_b
150001   1       0           1         0
150001   2       0           2         0
150001   3       0           3         0
150001   4       1           4         0
150001   4-5     1           4         5
150001   6       0           6         0
150001   7       0           7         0
150001   19      0           19        0
150001   20      1           20        0
150001   20-21   1           20        21
150001   NA      0           NA        NA
150001   22      1           22        0
150001   22-23   1           22        23
150001   24      1           24        0
150001   25      0           25        0

Hence, I need some code to insert a row in between two rows, when tnr_grp is 1 and tnr_a is not the same, preferably with dplyr. I tried a few things, but nothing really leading somewhere.

Code to reproduce the sample data:

df <- structure(list(id = c(150001L, 150001L, 150001L, 150001L, 150001L, 
                        150001L, 150001L, 150001L, 150001L, 150001L, 150001L, 150001L, 
                        150001L, 150001L), tnr = c("1", "2", "3", "4", "4-5", "6", "7", 
                                                   "19", "20", "20-21", "22", "22-23", "24", "25"), tnr_grp = c(0L, 
                                                                                                                0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L), tnr_a = c(1L, 
                                                                                                                                                                               2L, 3L, 4L, 4L, 6L, 7L, 19L, 20L, 20L, 22L, 22L, 24L, 25L), tnr_b = c(0L, 
                                                                                                                                                                                                                                                     0L, 0L, 0L, 5L, 0L, 0L, 0L, 0L, 21L, 0L, 23L, 0L, 0L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                 -14L))

Any help appreciated, thanks!

Upvotes: 1

Views: 85

Answers (1)

MarBlo
MarBlo

Reputation: 4524

It took me while to understand what you mean - I hope I have it now. But it took me even longer to decide to post what I have. Because I think it is NOT the most elegant solution.

But it is at least one, and as there were no others I decided to post it.

library(dplyr)

nam <- colnames(df)

res <- df %>% 
  mutate(rn = row_number()) %>%                         # row numbers
  group_by(tnr_a) %>%  
  mutate(l = n()) %>%                                   # count per group
  mutate(check = ifelse(l == lag(l),T, F)) %>%          # intermediate 
  na.omit() %>% 
  ungroup() %>% 
  mutate(check2 = rn + l) %>%                           # intermediate
  mutate(check3 = ifelse(check2 == lead(rn), T, F)) %>% # intermediate
  filter(!is.na(check3) & check3 == T) %>%              # final filter
  select(all_of(nam)) %>%                               # select right col
  mutate(tnr = NA, tnr_grp = NA, tnr_b = NA)            # get row and mutate to NA where needed

res
#> # A tibble: 1 x 5
#>       id tnr   tnr_grp tnr_a tnr_b
#>    <int> <lgl> <lgl>   <int> <lgl>
#> 1 150001 NA    NA         20 NA

# bind to original DF
df %>% bind_rows(res) %>%                 
  arrange(tnr_a)
#>        id   tnr tnr_grp tnr_a tnr_b
#> 1  150001     1       0     1     0
#> 2  150001     2       0     2     0
#> 3  150001     3       0     3     0
#> 4  150001     4       1     4     0
#> 5  150001   4-5       1     4     5
#> 6  150001     6       0     6     0
#> 7  150001     7       0     7     0
#> 8  150001    19       0    19     0
#> 9  150001    20       1    20     0
#> 10 150001 20-21       1    20    21
#> 11 150001  <NA>      NA    20    NA
#> 12 150001    22       1    22     0
#> 13 150001 22-23       1    22    23
#> 14 150001    24       0    24     0
#> 15 150001    25       0    25     0

Upvotes: 1

Related Questions