Reputation: 11
I want to add another column to my dataframe in R, containing the sum of the amount of orders, when the other columns are the same.
For example if this is my data frame:
orderdate / orderhour / deliverydate / deliveryhour / orders / New column: sum
wednesday / 12:00 / Friday / 17:00 / 5 / 0
wednesday / 12:00 / Friday / 17:00 / 2 / 7
thursday / 13:00 / Friday / 17:00 / 3 / 3
thursday/ 13:00 / Friday / 19:00 / 1 / 1
The last column is the one I want to add. In the end, the rows with zero can be removed, so I no longer have duplicate data. So only if all columns match (except orders) the sum of the orders should be taken. In the other cases the same value can be kept.
Can someone please help me?
Thanks in advance!
Upvotes: 0
Views: 413
Reputation: 16121
# example data
df = read.table(text = "
orderdate/orderhour/deliverydate/deliveryhour/orders
wednesday/12:00/Friday/17:00/5
wednesday/12:00/Friday/17:00/2
thursday/13:00/Friday/17:00/3
thursday/13:00/Friday/19:00/1
", sep="/", header=T, stringsAsFactors=F)
library(dplyr)
df %>%
group_by(orderdate, orderhour, deliverydate, deliveryhour) %>% # for each combination of those variables
mutate(sum_orders = ifelse(n() > 1 & # if there are multiple rows
row_number() == 1, # and this is row 1
0, # use a zero
cumsum(orders))) %>% # else use the cum sum
ungroup() # forget the grouping
# # A tibble: 4 x 6
# orderdate orderhour deliverydate deliveryhour orders sum_orders
# <chr> <chr> <chr> <chr> <dbl> <dbl>
# 1 wednesday 12:00 Friday 17:00 5.00 0
# 2 wednesday 12:00 Friday 17:00 2.00 7.00
# 3 thursday 13:00 Friday 17:00 3.00 3.00
# 4 thursday 13:00 Friday 19:00 1.00 1.00
Upvotes: 1