JHowIX
JHowIX

Reputation: 1803

When is operation complexity such that dplyr rowwise is needed?

According to the documentation the dplyr rowwise operator can be used to "support arbitrary complex operations that need to be applied to each row". I find this a little vague. For example, addition does not appear to rise to the level of complexity required for a rowwise:

df <- data.frame( a =  c(1,2,3,4), b = c(5,6,7,8)) 
df %>% 
  mutate(
    c = a+b,
  )

  a b  c
1 1 5  6
2 2 6  8
3 3 7 10
4 4 8 12

But a very similar function, sum does. For example:

df %>%
  mutate(
    d = sum(a,b)
  ) %>%
  rowwise() %>%
  mutate(
    e = sum(a,b)
  )

  a b  d  e
1 1 5 36  6
2 2 6 36  8
3 3 7 36 10
4 4 8 36 12

My question is, when exactly do we need to use rowwise in the course of dplyr operations? Anytime the operation is not a basic arithmetic one or are there some other rules for when an operation will be automatically treat its inputs as rowwise vs column wise?

Upvotes: 1

Views: 131

Answers (1)

davsjob
davsjob

Reputation: 1960

I think the short answer is that sum, max is not "vectorised", it acceps multiple vectors and gives you the aggregated answer, a bit weird. I usually try to use functions that dont require rowwise since it is slow, and the risk of error is high. An solution to your simple case could be:

library(hablar)
library(dplyr)

df <- data.frame( a =  c(1,2,3,4), b = c(5,6,7,8)) 

df %>% mutate(c = row_sum(a:b))

Upvotes: 1

Related Questions