denis
denis

Reputation: 802

How to insert a new row for each missing number of one variable while filling with NA for other variables using data.table in R?

How to insert a new row every time there is a break in the y column numbering, while filling the other columns with NA?

Edit: Iroha's suggestion based on tidyr::complete() works fine, as do the others. However, when I apply it to millions of rows, it runs for hours.
It seems like a data.table approach would be faster, but I never use it and I can't seem to find a solution.
Any help/advice would be greatly appreciated!

Initial data:

x <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b")
y <- c(1,1,2,4,4,4,7,7,11,1,3,3,3,6,6,8)
dat0 <- data.frame(x,y)  

>dat0
   x  y
1  a  1
2  a  1
3  a  2
4  a  4
5  a  4
6  a  4
7  a  7
8  a  7
9  a 11
10 b  1
11 b  3
12 b  3
13 b  3
14 b  6
15 b  6
16 b  8  

Desired output:

> dat1
      x  y
1     a  1
2     a  1
3     a  2
4  <NA>  3
5     a  4
6     a  4
7     a  4
8  <NA>  5
9  <NA>  6
10    a  7
11    a  7
12 <NA>  8
13 <NA>  9
14 <NA> 10
15    a 11
16    b  1
17 <NA>  2
18    b  3
19    b  3
20    b  3
21 <NA>  4
22 <NA>  5
23    b  6
24    b  6
25 <NA>  7
26    b  8  

Thanks for help

Upvotes: 0

Views: 156

Answers (4)

langtang
langtang

Reputation: 24845

Here is a data.table approach (in both examples below, be sure to start with setDT(dat0)):

f <- \(k) k[CJ(y=1:max(k$y)), on="y"]
dat0[, f(.SD), x, .SDcols=c("y", "x")][, c(2,3)]

Output:

        y      x
    <int> <char>
 1:     1      a
 2:     1      a
 3:     2      a
 4:     3   <NA>
 5:     4      a
 6:     4      a
 7:     4      a
 8:     5   <NA>
 9:     6   <NA>
10:     7      a
11:     7      a
12:     8   <NA>
13:     9   <NA>
14:    10   <NA>
15:    11      a
16:     1      b
17:     2   <NA>
18:     3      b
19:     3      b
20:     3      b
21:     4   <NA>
22:     5   <NA>
23:     6      b
24:     6      b
25:     7   <NA>
26:     8      b
        y      x

Alternatively, you could do this:

dat0[, x2:=x][dat0[, .(y=1:max(y)),x], on=.(x,y), .(x=x2, y)]

Output:

         x     y
    <char> <int>
 1:      a     1
 2:      a     1
 3:      a     2
 4:   <NA>     3
 5:      a     4
 6:      a     4
 7:      a     4
 8:   <NA>     5
 9:   <NA>     6
10:      a     7
11:      a     7
12:   <NA>     8
13:   <NA>     9
14:   <NA>    10
15:      a    11
16:      b     1
17:   <NA>     2
18:      b     3
19:      b     3
20:      b     3
21:   <NA>     4
22:   <NA>     5
23:      b     6
24:      b     6
25:   <NA>     7
26:      b     8

Upvotes: 2

Tim G
Tim G

Reputation: 4147

Hmm odd requirement, but in base R you could do

res <- dat0    
for(let in unique(dat0$x)){
  missing_y <- setdiff(1:max(dat0[dat0$x==let,]$y), dat0[dat0$x==let,]$y)
  missing_x <- rep(NA, length(missing_y))
  res <- rbind(data.frame(x=missing_x, y = missing_y),res)
}

But I would recommend

library(tidyverse)
res_filled <- dat0 %>% group_by(x) %>% complete(y = seq(min(y),max(y),1))

Upvotes: 1

lroha
lroha

Reputation: 34586

This is pretty straight forward with tidyr::complete() except that grouping variables can't be modified so need to be copied first.

library(tidyr)
library(dplyr)

dat0 |> 
  group_by(g2 = x) |> 
  complete(y = full_seq(y, 1))  |> 
  ungroup(g2) |> 
  select(-g2)

# A tibble: 26 × 2
       y x    
   <dbl> <chr>
 1     1 a    
 2     1 a    
 3     2 a    
 4     3 NA   
 5     4 a    
 6     4 a    
 7     4 a    
 8     5 NA   
 9     6 NA   
10     7 a    
# ℹ 16 more rows
# ℹ Use `print(n = ...)` to see more rows

Upvotes: 2

Edward
Edward

Reputation: 19339

A bit long-winded:

library(dplyr)

reframe(dat0, y=seq(min(y), max(y)), .by=x) %>%
  anti_join(dat0) %>%
  mutate(x2=x, x=NA) %>%
  full_join(dat0) %>%
  mutate(x2=coalesce(x, x2)) %>%
  arrange(x2, y) %>%
  select(-x2)

      x  y
1     a  1
2     a  1
3     a  2
4  <NA>  3
5     a  4
6     a  4
7     a  4
8  <NA>  5
9  <NA>  6
10    a  7
11    a  7
12 <NA>  8
13 <NA>  9
14 <NA> 10
15    a 11
16    b  1
17 <NA>  2
18    b  3
19    b  3
20    b  3
21 <NA>  4
22 <NA>  5
23    b  6
24    b  6
25 <NA>  7
26    b  8

Upvotes: 2

Related Questions