Hong Ooi
Hong Ooi

Reputation: 57696

Unnest a data frame and fill new rows with NAs

Let's say I have a nested df, and I want to unnest the columns:

df <- tibble::tribble(
    ~x, ~y, ~nestdf,
    1,  2,  tibble::tibble(a=1:2, b=3:4),
    3,  4,  tibble::tibble(a=3:5, b=5:7)
)
tidyr::unnest(df, nestdf)

#      x     y     a     b
#  <dbl> <dbl> <int> <int>
#1     1     2     1     3
#2     1     2     2     4
#3     3     4     3     5
#4     3     4     4     6
#5     3     4     5     7

The result has the x and y columns extended to match the dimensions of nestdf, with the new rows using the existing values. However, I want the new rows to contain NA, like so:

#      x     y     a     b
#  <dbl> <dbl> <int> <int>
#1     1     2     1     3
#2    NA    NA     2     4
#3     3     4     3     5
#4    NA    NA     4     6
#5    NA    NA     5     7

Is it possible to do this with unnest? Either the first or last row for each group can be kept as non-NA, I don't mind.

Upvotes: 3

Views: 458

Answers (3)

r.user.05apr
r.user.05apr

Reputation: 5456

You could convert x and y to lists first:

library(tidyverse)

df <- tibble::tribble(
  ~x, ~y, ~nestdf,
  1,  2,  tibble::tibble(a=1:2, b=3:4),
  3,  4,  tibble::tibble(a=3:5, b=5:7)
)

df %>%
  mutate_at(vars(x:y), ~map2(., nestdf, ~.x[seq(nrow(.y))])) %>%
  unnest(everything())

# A tibble: 5 x 4
#x     y     a     b
#<dbl> <dbl> <int> <int>
#  1     1     2     1     3
#2    NA    NA     2     4
#3     3     4     3     5
#4    NA    NA     4     6
#5    NA    NA     5     7

Upvotes: 1

thelatemail
thelatemail

Reputation: 93908

Repeating rows, and binding with an unnest of the nested list column(s):

nr <- sapply(df$nestdf, nrow) - 1
cbind(
  df[rep(rbind(seq_along(nr), NA), rbind(1, nr)), c("x","y")],
  unnest(df["nestdf"], cols=everything())
)

#   x  y a b
#1  1  2 1 3
#2 NA NA 2 4
#3  3  4 3 5
#4 NA NA 4 6
#5 NA NA 5 7

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

One way would be to change the duplicates to NA.

df1 <- tidyr::unnest(df, nestdf) 
cols <- c('x', 'y')
df1[duplicated(df1[cols]), cols] <- NA
df1

#      x     y     a     b
#  <dbl> <dbl> <int> <int>
#1     1     2     1     3
#2    NA    NA     2     4
#3     3     4     3     5
#4    NA    NA     4     6
#5    NA    NA     5     7

If the values in columns x and y can repeat you can create a row number to identify them uniquely -

library(dplyr)
library(tidyr)

df1 <- df %>% mutate(row = row_number()) %>% unnest(nestdf)
cols <- c('x', 'y', 'row')
df1[duplicated(df1[cols]), cols] <- NA
df1 <- select(df1, -row)

Upvotes: 1

Related Questions