jackbio
jackbio

Reputation: 130

Expand `NA`s to multiple rows filling with a grouping variable

I have a data frame where every single SerialNo has a missing value row and some may have filled data. I need to expand the NA rows to two new rows consisting of the unique values of PartsUsed.

I believe that I'm on the right track by using dplyr complete or expand but I can't seem to get the arguments correct.

Below are example datasets (one that I currently have and one that is the target df)

begin_ex_df <- tibble(
  SerialNo = c(1234, 5678, 5678, 5678, 9012, 1357, 1357, 1357, 2468, 8080),
  PartsUsed = c(NA, "A", "B", NA, NA, "A", "B", NA, NA, NA),
  Values = c(NA, 10, 15, NA, NA, 11, 14, NA, NA, NA)
)

ending_ex_df <- tibble(
  SerialNo = c(1234, 1234, 5678, 5678, 5678, 5678, 9012, 9012, 1357, 1357, 1357, 1357, 2468, 2468, 8080, 8080),
  PartsUsed = c("A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B"),
  Values = c(0, 0, 10, 15, 0, 0, 0, 0, 11, 14, 0, 0, 0, 0, 0, 0)
)

I expect every NA row to be expanded to have an "A" and "B" in the PartsUsed and it's Values filled to 0 (Note that filling the value is not necessary - it can remain NA.) The rows that already have an "A" or "B" should be left alone.

Upvotes: 3

Views: 120

Answers (2)

Matt
Matt

Reputation: 2987

Here is an alternative, which will keep duplicated SerialNo's

x <- subset(begin_ex_df, is.na(begin_ex_df$PartsUsed)) 
x <- expand.grid(SerialNo = x$SerialNo, PartsUsed = c("A", "B"))
x$Values <- 0 
y <- subset(begin_ex_df, !is.na(begin_ex_df$PartsUsed))

new_df <- rbind(x, y)

new_df[order(new_df$SerialNo),]

#  SerialNo PartsUsed Values
#1      1234         A      0
#2      1234         B      0
#3      1357         A      0
#4      1357         B      0
#5      1357         A     11
#6      1357         B     14
#7      2468         A      0
#8      2468         B      0
#9      5678         A      0
#10     5678         B      0
#11     5678         A     10
#12     5678         B     15
#13     8080         A      0
#14     8080         B      0
#15     9012         A      0
#16     9012         B      0

Edit

If you want to use tidyr complete then you can group_by creating an index for the groups with NA and without, and then drop NA at the end (you can also remove the Ind column).

library(dplyr)
library(tidyr)

begin_ex_df %>% 
   group_by(SerialNo, Ind = if_else(is.na(PartsUsed), "no", "yes")) %>% 
   complete(., SerialNo, PartsUsed = c("A", "B"), fill = list(Values = 0)) %>% 
   na.omit()

#  Ind   SerialNo PartsUsed Values
#   <chr>    <dbl> <chr>      <dbl>
# 1 no        1234 A              0
# 2 no        1234 B              0
# 3 no        1357 A              0
# 4 no        1357 B              0
# 5 yes       1357 A             11
# 6 yes       1357 B             14
# 7 no        2468 A              0
# 8 no        2468 B              0
# 9 no        5678 A              0
#10 no        5678 B              0
#11 yes       5678 A             10
#12 yes       5678 B             15
#13 no        8080 A              0
#14 no        8080 B              0
#15 no        9012 A              0
#16 no        9012 B              0

Upvotes: 3

iod
iod

Reputation: 7592

complete is from tidyr, not dplyr, but it's certainly a good direction to take. It does need to be combined with filter from dplyr to remove the NA lines created.

tidyr::complete(df,SerialNo,PartsUsed,fill=list(Values=0)) %>% 
  dplyr::filter(!is.na(PartsUsed))

# A tibble: 12 x 3
   SerialNo PartsUsed Values
      <dbl> <chr>      <dbl>
 1     1234 A              0
 2     1234 B              0
 3     1357 A             11
 4     1357 B             14
 5     2468 A              0
 6     2468 B              0
 7     5678 A             10
 8     5678 B             15
 9     8080 A              0
10     8080 B              0
11     9012 A              0
12     9012 B              0

Upvotes: 2

Related Questions