Drew
Drew

Reputation: 133

Can I overlook a missing variable in a summing part of a function?

This is a shortened version of my real df. I have a function (called: calc) which creates a new variable called 'total', for simplicity this adds up three variables: a, b, c. When I add a dataframe, to that function, that does not feature one variable (say c) so only has a & b, the function falls over. Is there a 'function' / simple way that counts the variables regardless if they are missing?

calc <- function(x)  {x %>% mutate(total = a + b + c)}

data.2 has two columns a & b with many rows of values, but when running that in the function it cannot find c so does not calculate.

new.df <- calc(data.2)

Many thanks.

Upvotes: 0

Views: 125

Answers (2)

Wil
Wil

Reputation: 3188

You can use rowwise() and c_across() with any_of() (or any other tidyselect function) from dplyr (>= 1.0.0).

library(dplyr)

df <- data.frame(a = rnorm(10), b = rnorm(10))
dfc <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))

calc <- function(x) {
  x %>% 
    rowwise() %>%
    mutate(total = sum(c_across(any_of(c("a", "b", "c"))))) %>%
    ungroup()
}

calc(df)
#> # A tibble: 10 x 3
#>         a      b   total
#>     <dbl>  <dbl>   <dbl>
#>  1 -0.884  0.851 -0.0339
#>  2 -1.56  -0.464 -2.02  
#>  3 -0.884  0.815 -0.0689
#>  4 -1.46  -0.259 -1.71  
#>  5  0.211 -0.528 -0.317 
#>  6  1.85   0.190  2.04  
#>  7 -1.31  -0.921 -2.23  
#>  8  0.450  0.394  0.845 
#>  9 -1.14   0.428 -0.714 
#> 10 -1.11   0.417 -0.698

calc(dfc)
#> # A tibble: 10 x 4
#>          a      b      c   total
#>      <dbl>  <dbl>  <dbl>   <dbl>
#>  1 -0.0868  0.632  1.81   2.36  
#>  2  0.568  -0.523  0.240  0.286 
#>  3 -0.0325  0.377 -0.437 -0.0921
#>  4  0.660   0.456  1.28   2.39  
#>  5 -0.123   1.75  -1.03   0.599 
#>  6  0.641   1.39   0.902  2.93  
#>  7  0.266   0.520  0.904  1.69  
#>  8 -1.53    0.319  0.439 -0.776 
#>  9  0.942   0.468 -1.69  -0.277 
#> 10  0.254  -0.600 -0.196 -0.542

If you want to be able to generalize beyond those 3 variables you can use any tidyselect methodology.

df <- data.frame(a = rnorm(10), b = rnorm(10))
dfc <- data.frame(a = rnorm(10), b = rnorm(10), c = rnorm(10))

calc <- function(x) {
  x %>% 
    rowwise() %>%
    mutate(total = sum(c_across(everything()))) %>%
    ungroup()
}

calc(df)
#> # A tibble: 10 x 3
#>          a      b  total
#>      <dbl>  <dbl>  <dbl>
#>  1  0.775   1.17   1.95 
#>  2 -1.05    1.21   0.155
#>  3  2.07   -0.264  1.81 
#>  4  1.11    0.793  1.90 
#>  5 -0.700  -0.216 -0.916
#>  6 -1.04   -1.03  -2.07 
#>  7 -0.525   1.60   1.07 
#>  8  0.354   0.828  1.18 
#>  9  0.126   0.110  0.236
#> 10 -0.0954 -0.603 -0.698

calc(dfc)
#> # A tibble: 10 x 4
#>          a       b       c     total
#>      <dbl>   <dbl>   <dbl>     <dbl>
#>  1 -0.616   0.767   0.0462  0.196   
#>  2 -0.370  -0.538  -0.186  -1.09    
#>  3  0.337   1.11   -0.700   0.751   
#>  4 -0.993  -0.531  -0.984  -2.51    
#>  5  0.0538  1.50   -0.0808  1.47    
#>  6 -0.907  -1.54   -0.734  -3.18    
#>  7 -1.65   -0.242   1.43   -0.455   
#>  8 -0.166   0.447  -0.281  -0.000524
#>  9  0.0637 -0.0185  0.754   0.800   
#> 10  1.81   -1.09   -2.15   -1.42

Created on 2020-09-10 by the reprex package (v0.3.0)

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389135

If you want to perform rowwise sum or mean they have na.rm argument which you can use to ignore NA values.

library(dplyr)
calc <- function(x) {x %>% mutate(total = rowSums(select(., a:c), na.rm = TRUE))}

In general case if you are not able to find a function which gives you an out-of-box solution you can replace NA values with 0 maybe and then perform the operation that you want to perform.

calc <- function(x)  {
  x %>% 
     mutate(across(a:c, tidyr::replace_na, 0), 
            total = a + b + c)
}

Upvotes: 2

Related Questions