user213544
user213544

Reputation: 2126

Set values to NA per row after specific value in R

I am struggling with some data wrangling in R. I have a wide data frame that contains values of 10 IDs, a number and five columns specifying if a test is positive, negative, or inconclusive.

df <- data.frame(ID = 1:10, number = rep(1:5, each=2), 
                   `1`= c(rep(c("pos", "neg"),3), rep("neg", 4)), 
                   `2`= c(NA, NA, rep("pos", 4), "neg", "neg", "pos", "neg"), 
                   `3`= c(rep(NA, 4), rep("pos", 4), "neg", "inconcl"), 
                   `4`= c(rep(NA, 6), rep("pos",3), "neg"), 
                   `5`=c(rep(NA, 8), "pos", "pos"), stringsAsFactors = FALSE)
df
#>    ID number  X1   X2      X3   X4   X5
#> 1   1      1 pos <NA>    <NA> <NA> <NA>
#> 2   2      1 neg <NA>    <NA> <NA> <NA>
#> 3   3      2 pos  pos    <NA> <NA> <NA>
#> 4   4      2 neg  pos    <NA> <NA> <NA>
#> 5   5      3 pos  pos     pos <NA> <NA>
#> 6   6      3 neg  pos     pos <NA> <NA>
#> 7   7      4 neg  neg     pos  pos <NA>
#> 8   8      4 neg  neg     pos  pos <NA>
#> 9   9      5 neg  pos     neg  pos  pos
#> 10 10      5 neg  neg inconcl  neg  pos

Per row, I would like to set all values after a positive pos or inconclusive inconcl value in the columns X1-X5 to NA.

Expected output

#>    ID number  X1   X2      X3 X4 X5
#> 1   1      1 pos <NA>    <NA> NA NA
#> 2   2      1 neg <NA>    <NA> NA NA
#> 3   3      2 pos <NA>    <NA> NA NA
#> 4   4      2 neg  pos    <NA> NA NA
#> 5   5      3 pos <NA>    <NA> NA NA
#> 6   6      3 neg  pos    <NA> NA NA
#> 7   7      4 neg  neg     pos NA NA
#> 8   8      4 neg  neg     pos NA NA
#> 9   9      5 neg  pos    <NA> NA NA
#> 10 10      5 neg  neg inconcl NA NA

Question

How do you do this with dplyr in R?

Upvotes: 2

Views: 201

Answers (6)

AnilGoyal
AnilGoyal

Reputation: 26238

I tried to do it for my own knowledge with map and accumulate within a single pipe syntax and it worked

library(tidyverse)
df %>% group_by(ID) %>%
  nest(data = c(X1, X2, X3, X4, X5)) %>%
  mutate(data = map(data, 
                    ~ .x %>% 
                      pivot_longer(everything()) %>%
                      mutate(value = accumulate(value, 
                                                ~ ifelse(.x %in% c("pos", "inconcl") | is.na(.x), 
                                                         NA,
                                                         .y)
                                                )
                             ) %>%
                      pivot_wider(names_from = name, values_from = value)
                    )
         ) %>%
  unnest_wider(data)

# A tibble: 10 x 7
# Groups:   ID [10]
      ID number X1    X2    X3      X4    X5   
   <int>  <int> <chr> <chr> <chr>   <chr> <chr>
 1     1      1 pos   NA    NA      NA    NA   
 2     2      1 neg   NA    NA      NA    NA   
 3     3      2 pos   NA    NA      NA    NA   
 4     4      2 neg   pos   NA      NA    NA   
 5     5      3 pos   NA    NA      NA    NA   
 6     6      3 neg   pos   NA      NA    NA   
 7     7      4 neg   neg   pos     NA    NA   
 8     8      4 neg   neg   pos     NA    NA   
 9     9      5 neg   pos   NA      NA    NA   
10    10      5 neg   neg   inconcl NA    NA

Upvotes: 1

www
www

Reputation: 39174

Here is a solution using base R. However, the proposed function can easily be applied in a dplyr pipeline as the following shows.

# Define a function to replace value with NA.
re_NA <- function(x){
  # Get the target column
  cols <- names(x)
  target_cols <- cols[grepl("X", cols)]
  
  # Use a for loop for the operation
  for (i in 2:length(target_cols)){
    # Define the reference column
    ref_col <- target_cols[i - 1]
    # Define the modified column 
    mod_col <- target_cols[i]
    # Modify the values
    x[[mod_col]] <- ifelse(x[[ref_col]] %in% c("pos", "inconcl") | is.na(x[[ref_col]]),
                            NA, x[[mod_col]])
  }
  
  return(x)
}

library(dplyr)

df %>% re_NA()
#    ID number  X1   X2      X3 X4 X5
# 1   1      1 pos <NA>    <NA> NA NA
# 2   2      1 neg <NA>    <NA> NA NA
# 3   3      2 pos <NA>    <NA> NA NA
# 4   4      2 neg  pos    <NA> NA NA
# 5   5      3 pos <NA>    <NA> NA NA
# 6   6      3 neg  pos    <NA> NA NA
# 7   7      4 neg  neg     pos NA NA
# 8   8      4 neg  neg     pos NA NA
# 9   9      5 neg  pos    <NA> NA NA
# 10 10      5 neg  neg inconcl NA NA

Upvotes: 1

Adi Sarid
Adi Sarid

Reputation: 819

The following answer is within dplyr as OP requested, also it is not dependent in the number of X columns (i.e., you can have as many columns as you like, the answer still holds).

library(tidyverse)

df %>% 
  pivot_longer(cols = c(-ID, -number)) %>% 
  mutate(is_pos_inconcl = if_else(value %in% c("pos", "inconcl"), 1, 0)) %>% 
  group_by(ID, number) %>% 
  mutate(should_na = cumsum(cumsum(is_pos_inconcl))) %>% 
  mutate(value = if_else(should_na > 1, NA_character_, value)) %>% 
  select(ID, number, name, value) %>% 
  pivot_wider(names_from = "name",
              values_from = "value") 

The trick is to reshape the data to a long form (pivot_longer), then make sure that the condition holds, and use cumsum (twice, since a first positive is ok). Finally, restructure the data back using pivot_wider.

Upvotes: 2

Waldi
Waldi

Reputation: 41260

You could transpose the data to apply column-wise Reduce :


library(dplyr)

cbind(df %>% select(!contains("X")),
      df %>% select(contains("X")) %>% 
        t %>%
        apply(2,function(col) Reduce(f = function(x,y) {ifelse(x=="pos"| x=='inconcl'|is.na(x),NA,y)}, x = col, accumulate = TRUE)) %>%
        t)

    ID number  V1   V2      V3   V4   V5
 1:  1      1 pos <NA>    <NA> <NA> <NA>
 2:  2      1 neg <NA>    <NA> <NA> <NA>
 3:  3      2 pos <NA>    <NA> <NA> <NA>
 4:  4      2 neg  pos    <NA> <NA> <NA>
 5:  5      3 pos <NA>    <NA> <NA> <NA>
 6:  6      3 neg  pos    <NA> <NA> <NA>
 7:  7      4 neg  neg     pos <NA> <NA>
 8:  8      4 neg  neg     pos <NA> <NA>
 9:  9      5 neg  pos    <NA> <NA> <NA>
10: 10      5 neg  neg inconcl <NA> <NA>

Upvotes: 1

Andreas
Andreas

Reputation: 344

I was hoping to have a shorter answer, but you this would work

library(tidyr)

dfLong <- df %>%
  pivot_longer(cols = starts_with("X"),
               names_to = "TestCol") 

dfFirstPosOrInconcl <- dfLong %>%
  filter(value %in% c("pos", "inconcl")) %>%
  group_by(ID) %>%
  summarize(FirstPosOrInconcl = min(TestCol),
            .groups = "drop")

dfLong %>%
  left_join(dfFirstPosOrInconcl, by = "ID") %>%
  filter(is.na(FirstPosOrInconcl) | TestCol <= FirstPosOrInconcl) %>%
  pivot_wider(id_cols = c(ID, number),
              names_from = TestCol,
              values_from = value)

Upvotes: 2

Jakob Gepp
Jakob Gepp

Reputation: 473

You could to this with this (not pretty) code:

df %>%
  mutate(X2 = ifelse(X1 %in% c("pos", "inconcl", NA), NA, X2)) %>%
  mutate(X3 = ifelse(X2 %in% c("pos", "inconcl", NA), NA, X3)) %>%
  mutate(X4 = ifelse(X3 %in% c("pos", "inconcl", NA), NA, X4)) %>%
  mutate(X5 = ifelse(X4 %in% c("pos", "inconcl", NA), NA, X5))

For this to work, you need to set stringsAsFactors = FALSE in the creation of df. Otherwise the result will contain the factor number instead of the label.

Upvotes: 2

Related Questions