Reputation: 2126
I am struggling with some data wrangling in R. I have a wide data frame that contains values of 10 IDs, a number and five columns specifying if a test is positive, negative, or inconclusive.
df <- data.frame(ID = 1:10, number = rep(1:5, each=2),
`1`= c(rep(c("pos", "neg"),3), rep("neg", 4)),
`2`= c(NA, NA, rep("pos", 4), "neg", "neg", "pos", "neg"),
`3`= c(rep(NA, 4), rep("pos", 4), "neg", "inconcl"),
`4`= c(rep(NA, 6), rep("pos",3), "neg"),
`5`=c(rep(NA, 8), "pos", "pos"), stringsAsFactors = FALSE)
df
#> ID number X1 X2 X3 X4 X5
#> 1 1 1 pos <NA> <NA> <NA> <NA>
#> 2 2 1 neg <NA> <NA> <NA> <NA>
#> 3 3 2 pos pos <NA> <NA> <NA>
#> 4 4 2 neg pos <NA> <NA> <NA>
#> 5 5 3 pos pos pos <NA> <NA>
#> 6 6 3 neg pos pos <NA> <NA>
#> 7 7 4 neg neg pos pos <NA>
#> 8 8 4 neg neg pos pos <NA>
#> 9 9 5 neg pos neg pos pos
#> 10 10 5 neg neg inconcl neg pos
Per row, I would like to set all values after a positive pos
or inconclusive inconcl
value in the columns X1-X5
to NA
.
Expected output
#> ID number X1 X2 X3 X4 X5
#> 1 1 1 pos <NA> <NA> NA NA
#> 2 2 1 neg <NA> <NA> NA NA
#> 3 3 2 pos <NA> <NA> NA NA
#> 4 4 2 neg pos <NA> NA NA
#> 5 5 3 pos <NA> <NA> NA NA
#> 6 6 3 neg pos <NA> NA NA
#> 7 7 4 neg neg pos NA NA
#> 8 8 4 neg neg pos NA NA
#> 9 9 5 neg pos <NA> NA NA
#> 10 10 5 neg neg inconcl NA NA
Question
How do you do this with dplyr
in R?
Upvotes: 2
Views: 201
Reputation: 26238
I tried to do it for my own knowledge with map
and accumulate
within a single pipe syntax and it worked
library(tidyverse)
df %>% group_by(ID) %>%
nest(data = c(X1, X2, X3, X4, X5)) %>%
mutate(data = map(data,
~ .x %>%
pivot_longer(everything()) %>%
mutate(value = accumulate(value,
~ ifelse(.x %in% c("pos", "inconcl") | is.na(.x),
NA,
.y)
)
) %>%
pivot_wider(names_from = name, values_from = value)
)
) %>%
unnest_wider(data)
# A tibble: 10 x 7
# Groups: ID [10]
ID number X1 X2 X3 X4 X5
<int> <int> <chr> <chr> <chr> <chr> <chr>
1 1 1 pos NA NA NA NA
2 2 1 neg NA NA NA NA
3 3 2 pos NA NA NA NA
4 4 2 neg pos NA NA NA
5 5 3 pos NA NA NA NA
6 6 3 neg pos NA NA NA
7 7 4 neg neg pos NA NA
8 8 4 neg neg pos NA NA
9 9 5 neg pos NA NA NA
10 10 5 neg neg inconcl NA NA
Upvotes: 1
Reputation: 39174
Here is a solution using base R. However, the proposed function can easily be applied in a dplyr
pipeline as the following shows.
# Define a function to replace value with NA.
re_NA <- function(x){
# Get the target column
cols <- names(x)
target_cols <- cols[grepl("X", cols)]
# Use a for loop for the operation
for (i in 2:length(target_cols)){
# Define the reference column
ref_col <- target_cols[i - 1]
# Define the modified column
mod_col <- target_cols[i]
# Modify the values
x[[mod_col]] <- ifelse(x[[ref_col]] %in% c("pos", "inconcl") | is.na(x[[ref_col]]),
NA, x[[mod_col]])
}
return(x)
}
library(dplyr)
df %>% re_NA()
# ID number X1 X2 X3 X4 X5
# 1 1 1 pos <NA> <NA> NA NA
# 2 2 1 neg <NA> <NA> NA NA
# 3 3 2 pos <NA> <NA> NA NA
# 4 4 2 neg pos <NA> NA NA
# 5 5 3 pos <NA> <NA> NA NA
# 6 6 3 neg pos <NA> NA NA
# 7 7 4 neg neg pos NA NA
# 8 8 4 neg neg pos NA NA
# 9 9 5 neg pos <NA> NA NA
# 10 10 5 neg neg inconcl NA NA
Upvotes: 1
Reputation: 819
The following answer is within dplyr
as OP requested, also it is not dependent in the number of X
columns (i.e., you can have as many columns as you like, the answer still holds).
library(tidyverse)
df %>%
pivot_longer(cols = c(-ID, -number)) %>%
mutate(is_pos_inconcl = if_else(value %in% c("pos", "inconcl"), 1, 0)) %>%
group_by(ID, number) %>%
mutate(should_na = cumsum(cumsum(is_pos_inconcl))) %>%
mutate(value = if_else(should_na > 1, NA_character_, value)) %>%
select(ID, number, name, value) %>%
pivot_wider(names_from = "name",
values_from = "value")
The trick is to reshape the data to a long form (pivot_longer
), then make sure that the condition holds, and use cumsum
(twice, since a first positive is ok).
Finally, restructure the data back using pivot_wider
.
Upvotes: 2
Reputation: 41260
You could transpose the data to apply column-wise Reduce
:
library(dplyr)
cbind(df %>% select(!contains("X")),
df %>% select(contains("X")) %>%
t %>%
apply(2,function(col) Reduce(f = function(x,y) {ifelse(x=="pos"| x=='inconcl'|is.na(x),NA,y)}, x = col, accumulate = TRUE)) %>%
t)
ID number V1 V2 V3 V4 V5
1: 1 1 pos <NA> <NA> <NA> <NA>
2: 2 1 neg <NA> <NA> <NA> <NA>
3: 3 2 pos <NA> <NA> <NA> <NA>
4: 4 2 neg pos <NA> <NA> <NA>
5: 5 3 pos <NA> <NA> <NA> <NA>
6: 6 3 neg pos <NA> <NA> <NA>
7: 7 4 neg neg pos <NA> <NA>
8: 8 4 neg neg pos <NA> <NA>
9: 9 5 neg pos <NA> <NA> <NA>
10: 10 5 neg neg inconcl <NA> <NA>
Upvotes: 1
Reputation: 344
I was hoping to have a shorter answer, but you this would work
library(tidyr)
dfLong <- df %>%
pivot_longer(cols = starts_with("X"),
names_to = "TestCol")
dfFirstPosOrInconcl <- dfLong %>%
filter(value %in% c("pos", "inconcl")) %>%
group_by(ID) %>%
summarize(FirstPosOrInconcl = min(TestCol),
.groups = "drop")
dfLong %>%
left_join(dfFirstPosOrInconcl, by = "ID") %>%
filter(is.na(FirstPosOrInconcl) | TestCol <= FirstPosOrInconcl) %>%
pivot_wider(id_cols = c(ID, number),
names_from = TestCol,
values_from = value)
Upvotes: 2
Reputation: 473
You could to this with this (not pretty) code:
df %>%
mutate(X2 = ifelse(X1 %in% c("pos", "inconcl", NA), NA, X2)) %>%
mutate(X3 = ifelse(X2 %in% c("pos", "inconcl", NA), NA, X3)) %>%
mutate(X4 = ifelse(X3 %in% c("pos", "inconcl", NA), NA, X4)) %>%
mutate(X5 = ifelse(X4 %in% c("pos", "inconcl", NA), NA, X5))
For this to work, you need to set stringsAsFactors = FALSE
in the creation of df
. Otherwise the result will contain the factor number instead of the label.
Upvotes: 2