Mike
Mike

Reputation: 2097

Replace NA in a Dataframe Column with a Value Only when Two Other Columns Are Also NA

library(tidyverse)

Using the sample data below, I'm trying to replace the NA's in Col1 with the number 22222, but only when all three columns are NA. So the final result should only have 22222 in Col1 for rows 4 and 7.

I would like to use tidyverse and I'm attempting something along the lines of:

DF%>%mutate_at(vars(Col1),funs(replace(.,if_else(is.na(one_of(Col1,Col2,Col3),22222,.)))))

Sample Data:(not sure if this is the correct way to create real "NA"'s (that work with is.na) in the sample data? My real data has blank cells in the Excel file, which when converted to CSV and imported to R results in NA's.)

Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
Col1<-c(34564,NA,43456,NA,45655,6789,NA,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
DF<-data_frame(ID,Col1,Col2,Col3)

Upvotes: 0

Views: 299

Answers (3)

cderv
cderv

Reputation: 6542

This solution works for any number of column. It will replace the value with 22222 for each rows that is all NA value in each column

library(dplyr, warn.conflicts = FALSE)
Col1<-c(34564,NA,43456,NA,45655,6789,99999,87667)
Col2<-c(34565,43456,55555,NA,65433,22234,NA,98909)
Col3<-c(45673,88789,11123,NA,55676,76566,NA,NA)
DF<-data_frame(Col1,Col2,Col3)

# Find the rows with all NA. Works with any number of column
all_na <- DF %>%
  is.na() %>%
  apply(1, all)

# Replace the value from this rows with 2222 and keep others
DF %>%
  mutate_all(funs(if_else(all_na, 22222, .)))
#> # A tibble: 8 x 3
#>    Col1  Col2  Col3
#>   <dbl> <dbl> <dbl>
#> 1 34564 34565 45673
#> 2    NA 43456 88789
#> 3 43456 55555 11123
#> 4 22222 22222 22222
#> 5 45655 65433 55676
#> 6  6789 22234 76566
#> 7 99999    NA    NA
#> 8 87667 98909    NA

Upvotes: 0

MKR
MKR

Reputation: 20085

One solution could be to use mapply function.

#Define a function to replace missing row values
replMissing <- function(x, y, z){
  ifelse(is.na(x) & is.na(y) & is.na(z), 22222, x )
}
# Call mapply and pass value of Col1, Col2 and Col3
DF$Col1 <- mapply(replMissing, DF$Col1, DF$Col2, DF$Col3)

#results
> DF
# A tibble: 8 x 4
     ID  Col1  Col2  Col3
  <dbl> <dbl> <dbl> <dbl>
1 34564 34564 34565 45673
2    NA    NA 43456 88789
3 43456 43456 55555 11123
4    NA 22222    NA    NA
5 45655 45655 65433 55676
6  6789  6789 22234 76566
7 99999 22222    NA    NA
8 87667 87667 98909    NA

The solution will be much simpler using data.table.

DF <- data.table(DF)
DF[is.na(Col1) & is.na(Col2) & is.na(Col3), Col1 := 22222]
# Result
> DF
      ID  Col1  Col2  Col3
1: 34564 34564 34565 45673
2:    NA    NA 43456 88789
3: 43456 43456 55555 11123
4:    NA 22222    NA    NA
5: 45655 45655 65433 55676
6:  6789  6789 22234 76566
7: 99999 22222    NA    NA
8: 87667 87667 98909    NA

Upvotes: 1

Kevin Arseneau
Kevin Arseneau

Reputation: 6264

Your question has a few errors, so my answer will attempt to fill in the blanks. The data frame you have provided does not contain id, for example. I have modified your sample to make this reproducible.

library(dplyr)

df <- tibble(
  id = c(34564, NA, 43456, NA, 45655, 6789, 99999, 87667),
  col1 = c(45673, 88789, 11123, NA, 55676, 76566, NA, NA),
  col2 = c(34564, NA, 43456, NA, 45655, 6789, NA, 87667),
  col3 = c(34565, 43456, 55555, NA, 65433, 22234, NA, 98909)
)

To solve a single column, you can just use if/else in a normal mutate.

df %>%
  mutate(col1 = if_else(
    is.na(col1) & is.na(col2) & is.na(col3), 22222, col1
  ))

# # A tibble: 8 x 4
#      id  col1  col2  col3
#   <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2    NA 88789    NA 43456
# 3 43456 11123 43456 55555
# 4    NA 22222    NA    NA
# 5 45655 55676 45655 65433
# 6  6789 76566  6789 22234
# 7 99999 22222    NA    NA
# 8 87667    NA 87667 98909

Your question suggests you actually want each column mutated, not only col1. You can substitute the funs(replace()) approach you started with to just apply the earlier if/else to each column listed in vars.

df %>%
  mutate_at(
    vars(col1, col2, col3),
    ~if_else(is.na(df$col1) & is.na(df$col2) & is.na(df$col3), 22222, .)
  )

# # A tibble: 8 x 4
#      id  col1  col2  col3
#   <dbl> <dbl> <dbl> <dbl>
# 1 34564 45673 34564 34565
# 2    NA 88789    NA 43456
# 3 43456 11123 43456 55555
# 4    NA 22222 22222 22222
# 5 45655 55676 45655 65433
# 6  6789 76566  6789 22234
# 7 99999 22222 22222 22222
# 8 87667    NA 87667 98909

Upvotes: 0

Related Questions