Ruth Johnson
Ruth Johnson

Reputation: 7

Create new column in R dataframe based on results from 3 other columns

I have a dataframe containing Id and scan results. 1 denoted if a result not seen on a scan. 2 if a result seen and no vector if scan not completed.

I wish to create one column at the end of the dataframe which checks all 3 columns and returns a "2" if result ever seen in any of the 3 scans. "1" if result not seen on a scan and no vector if patient never had a scan completed on any three modalities.

I have tried doing this in Excel and R. I would prefer to use R as I am learning this at the moment and want to continue learning new uses.

I have tried using

library(tidyverse)
USS_reports %>%
   mutate((filter(USSfluid=2 | CTfluid=2 | MRIfluid=2))

id  USSFluid    CTfluid MRIfluid
1       1             1        1
2       1                      1    
3       1             1        1
4       1             1 
5       1             1 
6       1             1 
7       1       
8                     1     
9       1       
10                    1       2 
11      1             2 

Upvotes: 0

Views: 240

Answers (2)

MartijnVanAttekum
MartijnVanAttekum

Reputation: 1445

as you want to give the highest value precedence, you could just use apply to take the max value per row (MARGIN = 1) of the dataframe excluding the first id column ([,-1]):

USS_reports %>% mutate(summary = apply(USS_reports[,-1], MARGIN = 1, 
FUN = function(row)max(row, na.rm = TRUE))) %>%  
mutate(summary = ifelse(summary == -Inf, NA, summary))

Note that the second mutate is needed to replace the -Inf values that are returned by max when all cols are NA with NA. For this to work, your df needs to be numeric though. If not, you would first have to do

USS_reports[] <- lapply(USS_reports, as.numeric)

(btw, if you want to test for equality in your code above, you have to use == instead of = )

Upvotes: 0

camille
camille

Reputation: 16871

Here's a solution that on first glance is less straightforward, but is intended to scale for more than these 3 columns you're checking. I gathered the dataframe into a long format, made a single string for each ID of all the results, then used a case_when to check for each of the possibilities: there's a result with a 2, there's a result with a 1, or there's no result. I like case_when to avoid lots of ifelses nested inside each other.

I also added a test case for when there's no result, just to make sure that possibility comes out okay too.

library(tidyverse)

df %>%
# test case with no results
    bind_rows(tibble(id = 12)) %>%
    gather(key = scan, value = result, -id) %>%
    group_by(id) %>%
    summarise(all_str = paste(result, collapse = ",")) %>%
    mutate(overall = case_when(
        str_detect(all_str, "2") ~ "2",
        str_detect(all_str, "1") ~ "1",
        T ~ "no result"
    ))

#> # A tibble: 12 x 3
#>       id all_str  overall  
#>    <dbl> <chr>    <chr>    
#>  1    1. 1,1,1    1        
#>  2    2. 1,1,NA   1        
#>  3    3. 1,1,1    1        
#>  4    4. 1,1,NA   1        
#>  5    5. 1,1,NA   1        
#>  6    6. 1,1,NA   1        
#>  7    7. 1,NA,NA  1        
#>  8    8. 1,NA,NA  1        
#>  9    9. 1,NA,NA  1        
#> 10   10. 1,2,NA   2        
#> 11   11. 1,2,NA   2        
#> 12   12. NA,NA,NA no result

Created on 2018-04-27 by the reprex package (v0.2.0).

Upvotes: 1

Related Questions