Reputation: 137

dplyr conditional column if not null to calculate overall percent

Hello a really simple question but I have just got stuck, how do I add a conditional column containing number 1 where completed column is not NA?

   id        completed
   <chr>     <chr>    
 1 abc123sdf 35929    
 2 124cv     NA       
 3 125xvdf   36295    
 4 126v      NA       
 5 127sdsd   43933    
 6 128dfgs   NA       
 7 129vsd    NA       
 8 130sdf    NA       
 9 131sdf    NA       
10 123sdfd   NA 

I need this to calculate an overall percent of completed column/id. 

(Additional question - how can I do this in dplyr without using a helper column?)

Thanks

Upvotes: 0

Answers (3)

akrun

Reputation: 887571

We can also do

library(dplyr)
df %>% 
     mutate(newcol = +(!is.na(completed)))

Upvotes: 0

Joe Roe

Reputation: 634

library("dplyr")
df <- data.frame(id = 1:10,
                 completed = c(35929, NA, 36295, NA, 43933, NA, NA, NA, NA, NA))

df %>% 
  mutate(is_na = as.integer(!is.na(completed)))
#>    id completed is_na
#> 1   1     35929     1
#> 2   2        NA     0
#> 3   3     36295     1
#> 4   4        NA     0
#> 5   5     43933     1
#> 6   6        NA     0
#> 7   7        NA     0
#> 8   8        NA     0
#> 9   9        NA     0
#> 10 10        NA     0

But you shouldn't need this extra column to calculate a percentage, you can just use na.rm:

df %>% 
  mutate(pct = completed / sum(completed, na.rm = TRUE))
#>    id completed       pct
#> 1   1     35929 0.3093141
#> 2   2        NA        NA
#> 3   3     36295 0.3124650
#> 4   4        NA        NA
#> 5   5     43933 0.3782209
#> 6   6        NA        NA
#> 7   7        NA        NA
#> 8   8        NA        NA
#> 9   9        NA        NA
#> 10 10        NA        NA

Upvotes: 1

Ronak Shah

Reputation: 389175

You can use is.na to check for NA values.

library(dplyr)
df %>% mutate(newcol = as.integer(!is.na(completed)))

#          id completed newcol
#1  abc123sdf     35929      1
#2      124cv        NA      0
#3    125xvdf     36295      1
#4       126v        NA      0
#5    127sdsd     43933      1
#6    128dfgs        NA      0
#7     129vsd        NA      0
#8     130sdf        NA      0
#9     131sdf        NA      0
#10   123sdfd        NA      0

Upvotes: 1

dplyr conditional column if not null to calculate overall percent

Answers (3)

Related Questions