Reputation: 163
I have large dataframe with many groups, one might look like this:
id tnr tnr_grp tnr_a tnr_b
150001 1 0 1 0
150001 2 0 2 0
150001 3 0 3 0
150001 4 1 4 0
150001 4-5 1 4 5
150001 6 0 6 0
150001 7 0 7 0
150001 19 0 19 0
150001 20 1 20 0
150001 20-21 1 20 21
150001 22 1 22 0
150001 22-23 1 22 23
150001 24 1 24 0
150001 25 0 25 0
I have itentified the overlapping numbers tnr_grp
, 1 meaning overlapping, 0 not overlapping. I also split two part numbers into tnr_a
and tnr_b
.
Next, I need to work with the occurences of 0 and 1 in tnr_grp
. The problem I encountered is that I sometimes have two overlapping groups right after each other - but which are actually not the same group. As in the example 20 overlaps with 20-21, and 22 overlaps with 22-23.
I want to insert a a new row in between these two groups, where the id
is preserved, tnr_grp
is 0, an all other values are NA. So the desired output should look like this:
id tnr tnr_grp tnr_a tnr_b
150001 1 0 1 0
150001 2 0 2 0
150001 3 0 3 0
150001 4 1 4 0
150001 4-5 1 4 5
150001 6 0 6 0
150001 7 0 7 0
150001 19 0 19 0
150001 20 1 20 0
150001 20-21 1 20 21
150001 NA 0 NA NA
150001 22 1 22 0
150001 22-23 1 22 23
150001 24 1 24 0
150001 25 0 25 0
Hence, I need some code to insert a row in between two rows, when tnr_grp
is 1 and tnr_a
is not the same, preferably with dplyr. I tried a few things, but nothing really leading somewhere.
Code to reproduce the sample data:
df <- structure(list(id = c(150001L, 150001L, 150001L, 150001L, 150001L,
150001L, 150001L, 150001L, 150001L, 150001L, 150001L, 150001L,
150001L, 150001L), tnr = c("1", "2", "3", "4", "4-5", "6", "7",
"19", "20", "20-21", "22", "22-23", "24", "25"), tnr_grp = c(0L,
0L, 0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L), tnr_a = c(1L,
2L, 3L, 4L, 4L, 6L, 7L, 19L, 20L, 20L, 22L, 22L, 24L, 25L), tnr_b = c(0L,
0L, 0L, 0L, 5L, 0L, 0L, 0L, 0L, 21L, 0L, 23L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-14L))
Any help appreciated, thanks!
Upvotes: 1
Views: 85
Reputation: 4524
It took me while to understand what you mean - I hope I have it now. But it took me even longer to decide to post what I have. Because I think it is NOT the most elegant solution.
But it is at least one, and as there were no others I decided to post it.
library(dplyr)
nam <- colnames(df)
res <- df %>%
mutate(rn = row_number()) %>% # row numbers
group_by(tnr_a) %>%
mutate(l = n()) %>% # count per group
mutate(check = ifelse(l == lag(l),T, F)) %>% # intermediate
na.omit() %>%
ungroup() %>%
mutate(check2 = rn + l) %>% # intermediate
mutate(check3 = ifelse(check2 == lead(rn), T, F)) %>% # intermediate
filter(!is.na(check3) & check3 == T) %>% # final filter
select(all_of(nam)) %>% # select right col
mutate(tnr = NA, tnr_grp = NA, tnr_b = NA) # get row and mutate to NA where needed
res
#> # A tibble: 1 x 5
#> id tnr tnr_grp tnr_a tnr_b
#> <int> <lgl> <lgl> <int> <lgl>
#> 1 150001 NA NA 20 NA
# bind to original DF
df %>% bind_rows(res) %>%
arrange(tnr_a)
#> id tnr tnr_grp tnr_a tnr_b
#> 1 150001 1 0 1 0
#> 2 150001 2 0 2 0
#> 3 150001 3 0 3 0
#> 4 150001 4 1 4 0
#> 5 150001 4-5 1 4 5
#> 6 150001 6 0 6 0
#> 7 150001 7 0 7 0
#> 8 150001 19 0 19 0
#> 9 150001 20 1 20 0
#> 10 150001 20-21 1 20 21
#> 11 150001 <NA> NA 20 NA
#> 12 150001 22 1 22 0
#> 13 150001 22-23 1 22 23
#> 14 150001 24 0 24 0
#> 15 150001 25 0 25 0
Upvotes: 1