leoluyi
leoluyi

Reputation: 992

dplyr mutate rowSums calculations or custom functions

I'm trying to mutate a new variable from sort of row calculation, say rowSums as below

iris %>% 
  mutate_(sumVar = 
            iris %>% 
            select(Sepal.Length:Petal.Width) %>%
            rowSums)

the result is that "sumVar" is truncated to its first value(10.2):

Source: local data frame [150 x 6]
Groups: <by row>

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1           5.1         3.5          1.4         0.2  setosa   10.2
2           4.9         3.0          1.4         0.2  setosa   10.2
3           4.7         3.2          1.3         0.2  setosa   10.2
4           4.6         3.1          1.5         0.2  setosa   10.2
5           5.0         3.6          1.4         0.2  setosa   10.2
6           5.4         3.9          1.7         0.4  setosa   10.2
..
Warning message:
Truncating vector to length 1 

Should it be rowwise applied? Or what's the right verb to use in these kind of calculations.

Edit:

More specifically, is there any way to realize the inline custom function with dplyr?

I'm wondering if it is possible do something like:

iris %>% 
  mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))

Upvotes: 86

Views: 103661

Answers (7)

Fons MA
Fons MA

Reputation: 1282

As requested, transforming my commment into an answer:

For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:

df %>% mutate(total = rowSums(across(where(is.numeric))))

across can take anything that select can (e.g. rowSums(across(Sepal.Length:Petal.Width)) also works).

Scroll down the row-wise vignette to find this and have a look at across

Upvotes: 12

HBat
HBat

Reputation: 5682

You can use rowwise() function:

iris %>% 
  rowwise() %>% 
  mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))

#> # A tibble: 150 x 6
#> # Rowwise: 
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    <dbl>
#>  1          5.1         3.5          1.4         0.2 setosa    10.2
#>  2          4.9         3            1.4         0.2 setosa     9.5
#>  3          4.7         3.2          1.3         0.2 setosa     9.4
#>  4          4.6         3.1          1.5         0.2 setosa     9.4
#>  5          5           3.6          1.4         0.2 setosa    10.2
#>  6          5.4         3.9          1.7         0.4 setosa    11.4
#>  7          4.6         3.4          1.4         0.3 setosa     9.7
#>  8          5           3.4          1.5         0.2 setosa    10.1
#>  9          4.4         2.9          1.4         0.2 setosa     8.9
#> 10          4.9         3.1          1.5         0.1 setosa     9.6
#> # ... with 140 more rows

"c_across() uses tidy selection syntax so you can to succinctly select many variables"'

Finally, if you want, you can use %>% ungroup at the end to exit from rowwise.

Upvotes: 25

llewmills
llewmills

Reputation: 3568

You can also use a grep in place of contains or matches, just in case you need to get fancy with the regular expressions (matches doesn't seem to much like negative lookaheads and the like in my experience).

iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))

Upvotes: 1

Melkor.cz
Melkor.cz

Reputation: 2207

I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:

iris %>% select(Sepal.Length:Petal.Width) %>%
  transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)

(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)

Upvotes: 3

talat
talat

Reputation: 70236

This is more of a workaround but could be used

iris %>% mutate(sumVar = rowSums(.[1:4]))

As written in comments, you can also use a select inside of mutate to get the columns you want to sum up, for example

iris %>% 
  mutate(sumVar = rowSums(select(., contains("Sepal")))) %>% 
  head 

or

iris %>% 
  mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>% 
  head

Upvotes: 146

psychonomics
psychonomics

Reputation: 774

Adding @docendodiscimus's comment as an answer. +1 to him!

iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))

Upvotes: 11

Davide Passaretti
Davide Passaretti

Reputation: 2771

A more complicated way would be:

 iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)

Upvotes: 12

Related Questions