KillerSnail
KillerSnail

Reputation: 3591

how do I translate this sql to dplyr

I want to do the equivalent of this SQL code:

SELECT Cat_1, Cat_2, Cat_3, Sum(cost), Sum(quantity) FROM tbl_of_data
GROUP BY Cat_1, Cat_2, Cat_3;

How do I do that using the dplyr package in R?

I tried:

library(dplyr)
tbl_of_data %>%
  group_by(Cat_1, Cat_2, Cat_3) %>%
  mutate(cost, quantity)

but I'm ending up with the same number of rows.

Upvotes: 1

Views: 655

Answers (1)

akrun
akrun

Reputation: 886938

We need either summarise or summarise_each to get a single row of sum for each combination of groups which would be equivalent to the sql output.

tbl_of_data %>%
       group_by(Cat_1, Cat_2, Cat_3) %>% 
       summarise_each(funs(sum), cost, quantity)

Or

tbl_of_data %>%
       group_by(Cat_1, Cat_2, Cat_3) %>% 
       summarise(cost=sum(cost), quantity=sum(quantity))

mutate creates a new column. In addition, in the OP's mutate code, it didn't wrap with the sum. If the intention is to create new sum columns, either mutate or mutate_each can be used i.e. ...%>% mutate(SumCost= sum(cost), SumQuantity = sum(quantity)) or ...%>% mutate_each(funs(sum), cost, quantity)

Upvotes: 4

Related Questions