Reputation: 1118
I am hoping to calculate a sum, ignoring duplicated values in a categorical column. Take the following data, inspired by the question Sum by distinct column value in R :
other_shop <- data.frame('shop_id' = c(1, 1, 1, 2, 3, 3),
'shop_name' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'),
'city' = c('London', 'London', 'Paris', 'Cardiff', 'Dublin', 'Dublin'),
'city_sale_goal' = c(12, 12, 9, 15, 10, 10),
'profit' = c(3, 1, 3, 6, 5, 9))
In this case, Shop A has multiple cities, with each city having a sales goal associated with it. What I'd like to calculate is the total sales goal of each shop, where the duplicated values in city are ignored. As London appears twice for Shop A, it should only be 12 + 9 = 21 to get the shop's total sales goal. At the same time, I'm hoping to preserve my data such that the total profit can be calculated, and the duplicated values of city needn't influence the simple group_by
summing that I'd normally use.
Is there a sum function that I could use to get the shop_total_sale_goal
column in the goal dataframe below, without having to break this down into two parts and join? Alternatively, a function that would take only the first value in a numeric column for each distinct value in a categorical column?
goal<-data.frame('shop_id' = c(1, 1, 1, 2, 3, 3),
'shop_name' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'),
'city' = c('London', 'London', 'Paris', 'Cardiff', 'Dublin', 'Dublin'),
'city_sale_goal' = c(12, 12, 9, 15, 10, 10),
'profit' = c(3, 1, 3, 6, 5, 9),
'shop_total_sale_goal'=c(21,21,21,15,10,10),
'shop_profit'=c(7,7,7,6,14,14))
Upvotes: 1
Views: 706
Reputation: 887981
We can wrap with unique
before getting the sum
library(dplyr)
other_shop %>%
group_by(shop_id, shop_name) %>%
mutate(shop_total_sale_goal = sum(unique(city_sale_goal), na.rm = TRUE),
shop_profit = sum(profit, na.rm = TRUE)) %>%
ungroup
-output
# A tibble: 6 x 7
# shop_id shop_name city city_sale_goal profit shop_total_sale_goal shop_profit
# <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#1 1 Shop A London 12 3 21 7
#2 1 Shop A London 12 1 21 7
#3 1 Shop A Paris 9 3 21 7
#4 2 Shop B Cardiff 15 6 15 6
#5 3 Shop C Dublin 10 5 10 14
#6 3 Shop C Dublin 10 9 10 14
Upvotes: 1