Jamie Warren
Jamie Warren

Reputation: 95

Replace NA values with preexisting value based on multiple conditions

I am working with the following data. It pertains to the dimensions of items and the boxes that they are shipped in.

       Box_Height     Box_Length     Box_Width Item_Height Item_Length Item_Width
 1             NA             74             4          NA          NA         NA
 2             NA             42            NA           6          42          6
 3              6             NA            NA           6          22          6
 4              6             NA            NA           6          42          6
 5              6             NA            NA           6          42          6
 6             NA             NA            NA          NA          NA         NA

According to the shipment company, when one of the box columns has NA values, it means that the item already comes in a box and is being shipped as is. So, I'd just need to replace the missing Box_Height, for instance, with the Item_Height.

I wrote the following code to do this:

df$Box_Height[is.na(df$Box_Height) & !is.na(df$Item_Height)] <- df$Item_Height

I'm ultimately trying to test when a row is missing a box dimension and the specific item dimension is not missing, then replace the missing box dimension with the item dimension.

I am receiving this error:

Error in df$Box_Height[is.na(df$Box_Height) &  : 
  NAs are not allowed in subscripted assignments

This is somewhat confusing since that's what I'm trying to replace.

If anyone has any advice as how to properly do this or where I'm going wrong I'd greatly appreciate the help.

Upvotes: 0

Views: 161

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26238

This syntax will do the required things to all attributes(dimensions) of box, simultaneously

library(dplyr)
df %>% mutate(across(starts_with("Box"), ~ ifelse(is.na(.x), 
                                                  get(str_replace(cur_column(), "Box", "Item")),
                                                  .x)))

# A tibble: 6 x 6
  Box_Height Box_Length Box_Width Item_Height Item_Length Item_Width
       <dbl>      <dbl>     <dbl>       <dbl>       <dbl>      <dbl>
1         NA         74         4          NA          NA         NA
2          6         42         6           6          42          6
3          6         22         6           6          22          6
4          6         42         6           6          42          6
5          6         42         6           6          42          6
6         NA         NA        NA          NA          NA         NA

Upvotes: 0

crestor
crestor

Reputation: 1486

I would recommend using the tidyverse syntax. And using if_else instead of ifelse.

library(tidyverse)

df <- tibble::tribble(
  ~Box_Height, ~Box_Length, ~Box_Width, ~Item_Height, ~Item_Length, ~Item_Width,
  NA,         74,          4,           NA,         NA,          NA,
  NA,         42,         NA,           6,          42,          6,
  6,          NA,         NA,           6,          22,          6,
  6,          NA,         NA,           6,          42,          6,
  6,          NA,         NA,           6,          42,          6,
  NA,         NA,         NA,           NA,         NA,          NA
)


df %>%
  mutate(Item_Height = if_else(
    is.na(Box_Height) & !is.na(Item_Height),
    Item_Height,
    Box_Height
  ))
#> # A tibble: 6 x 6
#>   Box_Height Box_Length Box_Width Item_Height Item_Length Item_Width
#>        <dbl>      <dbl>     <dbl>       <dbl>       <dbl>      <dbl>
#> 1         NA         74         4          NA          NA         NA
#> 2         NA         42        NA           6          42          6
#> 3          6         NA        NA           6          22          6
#> 4          6         NA        NA           6          42          6
#> 5          6         NA        NA           6          42          6
#> 6         NA         NA        NA          NA          NA         NA

Upvotes: 0

Damian
Damian

Reputation: 1433

Try using ifelse() to apply the same condition.

df$Box_Height <- ifelse(is.na(df$Box_Height) & !is.na(df$Item_Height), df$Item_Height, df$Box_Height)

The ifelse() function requires you to provide values for cases where the condition is true and false separately which ensures that the vector lengths will match. Subsetting df$Box_Height with [ probably results in a vector that is shorter than df$Item_Height, which is unsubsetted.

Upvotes: 1

Related Questions