Reputation: 51

Replace NA values when they are in two adjacent columns

Hi this is an example of a similar dataframe I am working with. I have an experiment with 10 samples and two replicates

df <- data.frame("ID" = c(1,2,3,4,5,6,7,8,9,10),
                "Rep1" =  c(6,5,3,"Na","Na",9,4,"Na","Na",2),
                 "Rep2" = c(8,4,4,"Na",3,"Na",6,"Na",2,1))

I have different Na values, however, I only want to replace them with zeros in the samples 4 and 8 due to they are the only ones which have NA in both replicates. Then, other samples would maintain the "NA".

Upvotes: 3

Answers (4)

AnilGoyal

Reputation: 26238

Though it has been marked as solved, yet I propose a simple answer

df <- data.frame("ID" = c(1,2,3,4,5,6,7,8,9,10),
                 "Rep1" =  c(6,5,3,"Na","Na",9,4,"Na","Na",2),
                 "Rep2" = c(8,4,4,"Na",3,"Na",6,"Na",2,1))
library(dplyr)

df %>% group_by(ID) %>%
  mutate(replace(cur_data(), all(cur_data() == 'Na'), '0'))

#> # A tibble: 10 x 3
#> # Groups:   ID [10]
#>       ID Rep1  Rep2 
#>    <dbl> <chr> <chr>
#>  1     1 6     8    
#>  2     2 5     4    
#>  3     3 3     4    
#>  4     4 0     0    
#>  5     5 Na    3    
#>  6     6 9     Na   
#>  7     7 4     6    
#>  8     8 0     0    
#>  9     9 Na    2    
#> 10    10 2     1

df %>% rowwise() %>%
  mutate(replace(cur_data()[-1], all(cur_data()[-1] == 'Na'), '0'))

Upvotes: 1

Anoushiravan R

Reputation: 21938

You can also use the following solution. In the following solution we iterate over each row and detect corresponding index or indices that is (are) equal to Na then if there were more that one index we replace it with 0 otherwise the row will remain as it:

library(dplyr)
library(purrr)

df %>%
  pmap_df(., ~ {ind <- which(c(...) == "Na"); 
  if(length(ind) > 1) {
    replace(c(...), ind, "0")
  } else {
    c(...)
  } 
 }
) %>%
  mutate(across(ID, as.integer))

# A tibble: 10 x 3
      ID Rep1  Rep2 
   <int> <chr> <chr>
 1     1 6     8    
 2     2 5     4    
 3     3 3     4    
 4     4 0     0    
 5     5 Na    3    
 6     6 9     Na   
 7     7 4     6    
 8     8 0     0    
 9     9 Na    2    
10    10 2     1

P.S = I almost went crazy as why I could not get it to work only to realize your NAs are in fact Na.

Upvotes: 2

TarJae

Reputation: 79311

With dplyr we could:

library(dplyr)
df %>% 
  mutate(across(starts_with("Rep"), ~case_when(.=="Na" & ID==4 | ID==8 ~ "0",
                                               TRUE ~ .)))

Output:

   ID Rep1 Rep2
1   1    6    8
2   2    5    4
3   3    3    4
4   4    0    0
5   5   Na    3
6   6    9   Na
7   7    4    6
8   8    0    0
9   9   Na    2
10 10    2    1

Upvotes: 1

akrun

Reputation: 887971

We create an index where the 'Rep' columns are both "Na" with rowSums on a logical matrix. Use the row, column index/names to subset the data and assign the values to 0

nm1 <- grep("Rep", names(df), value = TRUE)
i1 <-  rowSums(df[nm1] == "Na") == length(nm1) 
df[i1, nm1] <- 0

-output

df
   ID Rep1 Rep2
1   1    6    8
2   2    5    4
3   3    3    4
4   4    0    0
5   5   Na    3
6   6    9   Na
7   7    4    6
8   8    0    0
9   9   Na    2
10 10    2    1

As the OP created string "Na", the column types are not numeric. We can convert this to numeric as

df[-1] <- lapply(df[-1], as.numeric)

forces the "Na" to be converted to NA

-output

df
   ID Rep1 Rep2
1   1    6    8
2   2    5    4
3   3    3    4
4   4    0    0
5   5   NA    3
6   6    9   NA
7   7    4    6
8   8    0    0
9   9   NA    2
10 10    2    1

Upvotes: 2

Replace NA values when they are in two adjacent columns

Answers (4)

Related Questions