Leondra
Leondra

Reputation: 21

Getting duplicates in Group_by() in R

I am getting duplicates in my group_by() results in R. Say I am trying to group the following data frame by name:

name <- c("John", "Sally", "Sally", "Sue")
sales <- c(10, 20, 5, 30)
example <- data.frame(name, sales)
print(example)

So I wanted to create a table that shows all the sales for each salesperson, using the below code:

library(dplyr)

example %>% group_by(name) %>% select(name, sales)

However, I keep getting "Sally" listed twice. Instead, I want to get Sally only once with her total sales (25). How do I get distinct values in my "name" column? Been googling this all day as I thought group_by was supposed to do that.

Do I use distinct()? I saw a similar post for Python HERE and the top contributor said the user should try using sort. I actually gave it a try, but all of a sudden R Studio is saying it can't find the object "names" when I add it to get this code:

example %>% sort(name) group_by(name) %>% select(name, sales)

But when I remove the sort() function, R managed to read "group_by(name)" just fine. What am I missing?

Thanks

Upvotes: 1

Views: 4846

Answers (1)

akrun
akrun

Reputation: 887851

We can use summarise to sum the 'sales' grouped by 'name'

example %>% 
    group_by(name) %>% 
    summarise(sales = sum(sales))

Upvotes: 1

Related Questions