Reputation: 57
Hi I have a dataset where one row presents a item in an order. One order can have more items and therefore more rows in a dataset.
data looks like:
code nr. Of items
1 252222016 5
3 252812016 1
5 252812016 1
6 253012016 1
12 253042016 20
13 253042016 20
15 253052016 1
16 253072016 3
18 253082016 4
I would like a result where I would summ the "nr. Of items" where there is a same "code" and write the resoult in the row. if one order has more items I would like to have a resoult written in the last row of an order and previous rows would be NAs.
how I want the final solution to look like:
code nr. Of items result
1 252222016 5 5
3 252812016 1 na
5 252812016 1 2
6 253012016 1 1
12 253042016 20 na
13 253042016 20 40
15 253052016 1 1
16 253072016 3 3
18 253082016 4 4
I would appreciate the help!
Upvotes: 2
Views: 77
Reputation: 56149
Another dplyr alternative, using ifelse
and fromLast = TRUE
:
library(dplyr)
df1 <- df1 %>%
group_by(code) %>%
mutate(result = ifelse(duplicated(code, fromLast = TRUE), NA, sum(nr.Of.items)))
Upvotes: 2
Reputation: 3587
Another approach using dplyr
package (using df1
of @bouncyball) is :
library(dplyr)
df1 %>% group_by(code) %>% mutate(result=ifelse(row_number()==n(),sum(items),NA))
Upvotes: 0
Reputation: 887118
We can do this with data.table
. Convert the 'data.frame' to 'data.table' (setDT(df1)
), grouped by 'code', if
the number of rows is greater than 1 (.N >1
), then replicate NA
by one less than the number of rows and concatenate (c(
) with sum
of 'items' or else
return 'items'. Assign (:=
the output to create the new column 'result'.
library(data.table)
setDT(df1)[, result := if(.N>1) c(rep(NA, .N-1), sum(items)) else items, by = code]
df1
# code items result
#1: 252222016 5 5
#2: 252812016 1 NA
#3: 252812016 1 2
#4: 253012016 1 1
#5: 253042016 20 NA
#6: 253042016 20 40
#7: 253052016 1 1
#8: 253072016 3 3
#9: 253082016 4 4
Or instead of using the if/else
we can also create NA on positions other than the last row and multiply with the sum
of 'items'
setDT(df1)[, result := NA^(seq_len(.N) != .N)*sum(items) , by = code]
df1 <- structure(list(
code = c(252222016L, 252812016L, 252812016L, 253012016L,
253042016L, 253042016L, 253052016L, 253072016L, 253082016L),
items = c(5L, 1L, 1L, 1L, 20L, 20L, 1L, 3L, 4L)),
.Names = c("code", "items"), class = "data.frame",
row.names = c("1", "3", "5", "6", "12", "13", "15", "16", "18"))
Upvotes: 0
Reputation: 10761
We can accomplish this using the dplyr
package:
library(dplyr) # load package
df1 %>%
group_by(code) %>%
mutate(rownum = 1,
c_s_rn = cumsum(rownum),
result = ifelse(c_s_rn == max(c_s_rn), sum(items), NA)) %>%
select(-rownum, -c_s_rn)
# code items result
# 1 252222016 5 5
# 2 252812016 1 NA
# 3 252812016 1 2
# 4 253012016 1 1
# 5 253042016 20 NA
# 6 253042016 20 40
# 7 253052016 1 1
# 8 253072016 3 3
# 9 253082016 4 4
There's also a base R
solution using lapply
and split
:
df1_2 <- df1
df1_2$rownum <- 1
do.call('rbind',
lapply(split(df1_2, df1_2$code), function(x)
data.frame(x,
result = ifelse(cumsum(x$rownum) == sum(x$rownum), sum(x$items), NA)))
)[,-3]
# code items result
# 252222016 252222016 5 5
# 252812016.3 252812016 1 NA
# 252812016.5 252812016 1 2
# 253012016 253012016 1 1
# 253042016.12 253042016 20 NA
# 253042016.13 253042016 20 40
# 253052016 253052016 1 1
# 253072016 253072016 3 3
# 253082016 253082016 4 4
df1 <- structure(list(
code = c(252222016L, 252812016L, 252812016L, 253012016L,
253042016L, 253042016L, 253052016L, 253072016L, 253082016L),
items = c(5L, 1L, 1L, 1L, 20L, 20L, 1L, 3L, 4L)),
.Names = c("code", "items"), class = "data.frame",
row.names = c("1", "3", "5", "6", "12", "13", "15", "16", "18"))
Upvotes: 2
Reputation: 51592
You can use the duplicated
function with the parameter fromLast = TRUE
, i.e.
library(dplyr)
df %>%
group_by(code) %>%
mutate(new = replace(cumsum(nr. Of items), duplicated(code, fromLast = TRUE), NA))
#Source: local data frame [9 x 3]
#Groups: code [7]
# code nr new
# <int> <int> <int>
#1 252222016 5 5
#2 252812016 1 NA
#3 252812016 1 2
#4 253012016 1 1
#5 253042016 20 NA
#6 253042016 20 40
#7 253052016 1 1
#8 253072016 3 3
#9 253082016 4 4
Upvotes: 3