Pake
Pake

Reputation: 1118

Sum by distinct column value in R, ignoring duplicate values

I am hoping to calculate a sum, ignoring duplicated values in a categorical column. Take the following data, inspired by the question Sum by distinct column value in R :

other_shop <- data.frame('shop_id' = c(1, 1, 1, 2, 3, 3), 
                   'shop_name' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'), 
                   'city' = c('London', 'London', 'Paris', 'Cardiff', 'Dublin', 'Dublin'), 
                   'city_sale_goal' = c(12, 12, 9, 15, 10, 10), 
                   'profit' = c(3, 1, 3, 6, 5, 9))

In this case, Shop A has multiple cities, with each city having a sales goal associated with it. What I'd like to calculate is the total sales goal of each shop, where the duplicated values in city are ignored. As London appears twice for Shop A, it should only be 12 + 9 = 21 to get the shop's total sales goal. At the same time, I'm hoping to preserve my data such that the total profit can be calculated, and the duplicated values of city needn't influence the simple group_by summing that I'd normally use.

Is there a sum function that I could use to get the shop_total_sale_goal column in the goal dataframe below, without having to break this down into two parts and join? Alternatively, a function that would take only the first value in a numeric column for each distinct value in a categorical column?

goal<-data.frame('shop_id' = c(1, 1, 1, 2, 3, 3), 
             'shop_name' = c('Shop A', 'Shop A', 'Shop A', 'Shop B', 'Shop C', 'Shop C'), 
             'city' = c('London', 'London', 'Paris', 'Cardiff', 'Dublin', 'Dublin'), 
             'city_sale_goal' = c(12, 12, 9, 15, 10, 10), 
             'profit' = c(3, 1, 3, 6, 5, 9),
             'shop_total_sale_goal'=c(21,21,21,15,10,10),
             'shop_profit'=c(7,7,7,6,14,14))

Upvotes: 1

Views: 706

Answers (1)

akrun
akrun

Reputation: 887981

We can wrap with unique before getting the sum

library(dplyr)
other_shop %>%
    group_by(shop_id, shop_name) %>%
    mutate(shop_total_sale_goal = sum(unique(city_sale_goal), na.rm = TRUE), 
           shop_profit = sum(profit, na.rm = TRUE)) %>%
    ungroup

-output

# A tibble: 6 x 7
#  shop_id shop_name city    city_sale_goal profit shop_total_sale_goal shop_profit
#    <dbl> <chr>     <chr>            <dbl>  <dbl>                <dbl>       <dbl>
#1       1 Shop A    London              12      3                   21           7
#2       1 Shop A    London              12      1                   21           7
#3       1 Shop A    Paris                9      3                   21           7
#4       2 Shop B    Cardiff             15      6                   15           6
#5       3 Shop C    Dublin              10      5                   10          14
#6       3 Shop C    Dublin              10      9                   10          14

Upvotes: 1

Related Questions