Reputation: 21
I am getting duplicates in my group_by() results in R. Say I am trying to group the following data frame by name:
name <- c("John", "Sally", "Sally", "Sue")
sales <- c(10, 20, 5, 30)
example <- data.frame(name, sales)
print(example)
So I wanted to create a table that shows all the sales for each salesperson, using the below code:
library(dplyr)
example %>% group_by(name) %>% select(name, sales)
However, I keep getting "Sally" listed twice. Instead, I want to get Sally only once with her total sales (25). How do I get distinct values in my "name" column? Been googling this all day as I thought group_by was supposed to do that.
Do I use distinct()? I saw a similar post for Python HERE and the top contributor said the user should try using sort. I actually gave it a try, but all of a sudden R Studio is saying it can't find the object "names" when I add it to get this code:
example %>% sort(name) group_by(name) %>% select(name, sales)
But when I remove the sort() function, R managed to read "group_by(name)" just fine. What am I missing?
Thanks
Upvotes: 1
Views: 4846
Reputation: 887851
We can use summarise
to sum
the 'sales' grouped by 'name'
example %>%
group_by(name) %>%
summarise(sales = sum(sales))
Upvotes: 1