Reputation: 7
I have a dataframe containing Id and scan results. 1 denoted if a result not seen on a scan. 2 if a result seen and no vector if scan not completed.
I wish to create one column at the end of the dataframe which checks all 3 columns and returns a "2" if result ever seen in any of the 3 scans. "1" if result not seen on a scan and no vector if patient never had a scan completed on any three modalities.
I have tried doing this in Excel and R. I would prefer to use R as I am learning this at the moment and want to continue learning new uses.
I have tried using
library(tidyverse)
USS_reports %>%
mutate((filter(USSfluid=2 | CTfluid=2 | MRIfluid=2))
id USSFluid CTfluid MRIfluid
1 1 1 1
2 1 1
3 1 1 1
4 1 1
5 1 1
6 1 1
7 1
8 1
9 1
10 1 2
11 1 2
Upvotes: 0
Views: 240
Reputation: 1445
as you want to give the highest value precedence, you could just use apply
to take the max
value per row (MARGIN = 1
) of the dataframe excluding the first id column ([,-1]
):
USS_reports %>% mutate(summary = apply(USS_reports[,-1], MARGIN = 1,
FUN = function(row)max(row, na.rm = TRUE))) %>%
mutate(summary = ifelse(summary == -Inf, NA, summary))
Note that the second mutate is needed to replace the -Inf values that are returned by max when all cols are NA with NA. For this to work, your df needs to be numeric though. If not, you would first have to do
USS_reports[] <- lapply(USS_reports, as.numeric)
(btw, if you want to test for equality in your code above, you have to use == instead of = )
Upvotes: 0
Reputation: 16871
Here's a solution that on first glance is less straightforward, but is intended to scale for more than these 3 columns you're checking. I gather
ed the dataframe into a long format, made a single string for each ID of all the results, then used a case_when
to check for each of the possibilities: there's a result with a 2, there's a result with a 1, or there's no result. I like case_when
to avoid lots of ifelse
s nested inside each other.
I also added a test case for when there's no result, just to make sure that possibility comes out okay too.
library(tidyverse)
df %>%
# test case with no results
bind_rows(tibble(id = 12)) %>%
gather(key = scan, value = result, -id) %>%
group_by(id) %>%
summarise(all_str = paste(result, collapse = ",")) %>%
mutate(overall = case_when(
str_detect(all_str, "2") ~ "2",
str_detect(all_str, "1") ~ "1",
T ~ "no result"
))
#> # A tibble: 12 x 3
#> id all_str overall
#> <dbl> <chr> <chr>
#> 1 1. 1,1,1 1
#> 2 2. 1,1,NA 1
#> 3 3. 1,1,1 1
#> 4 4. 1,1,NA 1
#> 5 5. 1,1,NA 1
#> 6 6. 1,1,NA 1
#> 7 7. 1,NA,NA 1
#> 8 8. 1,NA,NA 1
#> 9 9. 1,NA,NA 1
#> 10 10. 1,2,NA 2
#> 11 11. 1,2,NA 2
#> 12 12. NA,NA,NA no result
Created on 2018-04-27 by the reprex package (v0.2.0).
Upvotes: 1