Reputation: 992
I'm trying to mutate a new variable from sort of row calculation,
say rowSums
as below
iris %>%
mutate_(sumVar =
iris %>%
select(Sepal.Length:Petal.Width) %>%
rowSums)
the result is that "sumVar" is truncated to its first value(10.2):
Source: local data frame [150 x 6]
Groups: <by row>
Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
1 5.1 3.5 1.4 0.2 setosa 10.2
2 4.9 3.0 1.4 0.2 setosa 10.2
3 4.7 3.2 1.3 0.2 setosa 10.2
4 4.6 3.1 1.5 0.2 setosa 10.2
5 5.0 3.6 1.4 0.2 setosa 10.2
6 5.4 3.9 1.7 0.4 setosa 10.2
..
Warning message:
Truncating vector to length 1
Should it be rowwise
applied? Or what's the right verb to use in these kind of calculations.
Edit:
More specifically, is there any way to realize the inline custom function with dplyr
?
I'm wondering if it is possible do something like:
iris %>%
mutate(sumVar = colsum_function(Sepal.Length:Petal.Width))
Upvotes: 86
Views: 103661
Reputation: 1282
As requested, transforming my commment into an answer:
For operations like sum that already have an efficient vectorised row-wise alternative, the proper way is currently:
df %>% mutate(total = rowSums(across(where(is.numeric))))
across
can take anything that select
can (e.g. rowSums(across(Sepal.Length:Petal.Width))
also works).
Scroll down the row-wise vignette to find this and have a look at across
Upvotes: 12
Reputation: 5682
You can use rowwise()
function:
iris %>%
rowwise() %>%
mutate(sumVar = sum(c_across(Sepal.Length:Petal.Width)))
#> # A tibble: 150 x 6
#> # Rowwise:
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sumVar
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 10.2
#> 2 4.9 3 1.4 0.2 setosa 9.5
#> 3 4.7 3.2 1.3 0.2 setosa 9.4
#> 4 4.6 3.1 1.5 0.2 setosa 9.4
#> 5 5 3.6 1.4 0.2 setosa 10.2
#> 6 5.4 3.9 1.7 0.4 setosa 11.4
#> 7 4.6 3.4 1.4 0.3 setosa 9.7
#> 8 5 3.4 1.5 0.2 setosa 10.1
#> 9 4.4 2.9 1.4 0.2 setosa 8.9
#> 10 4.9 3.1 1.5 0.1 setosa 9.6
#> # ... with 140 more rows
"c_across()
uses tidy selection syntax so you can to succinctly select many variables"'
Finally, if you want, you can use %>% ungroup
at the end to exit from rowwise.
Upvotes: 25
Reputation: 3568
You can also use a grep in place of contains
or matches
, just in case you need to get fancy with the regular expressions (matches
doesn't seem to much like negative lookaheads and the like in my experience).
iris %>% mutate(sumVar = rowSums(select(., grep("Sepal", names(.)))))
Upvotes: 1
Reputation: 2207
I am using this simple solution, which is a more robust modification of the answer by Davide Passaretti:
iris %>% select(Sepal.Length:Petal.Width) %>%
transmute(sumVar = rowSums(.)) %>% bind_cols(iris, .)
(But it requires a defined row order, which should be fine, unless you work with remote datasets perhaps..)
Upvotes: 3
Reputation: 70236
This is more of a workaround but could be used
iris %>% mutate(sumVar = rowSums(.[1:4]))
As written in comments, you can also use a select
inside of mutate to get the columns you want to sum up, for example
iris %>%
mutate(sumVar = rowSums(select(., contains("Sepal")))) %>%
head
or
iris %>%
mutate(sumVar = select(., contains("Sepal")) %>% rowSums()) %>%
head
Upvotes: 146
Reputation: 774
Adding @docendodiscimus's comment as an answer. +1 to him!
iris %>% mutate(sumVar = rowSums(select(., contains("Sepal"))))
Upvotes: 11
Reputation: 2771
A more complicated way would be:
iris %>% select(Sepal.Length:Petal.Width) %>%
mutate(sumVar = rowSums(.)) %>% left_join(iris)
Upvotes: 12