Hershey Y
Hershey Y

Reputation: 61

Find out the item first time shows in a data set

I have a data set ProductTable, I want to return the date of all the ProductsFamily has been ordered first time and the very last time. Examples:

ProductTable
 OrderPostingYear OrderPostingMonth OrderPostingDate ProductsFamily Sales QTY
2008               1                       20          R1            5234   1
2008               1                       12          R2            223    2
2009               1                       30          R3            34     1 
2008               2                       1           R1            1634   3
2010               4                       23          R3            224    1 
2009               3                       20          R1            5234   1
2010               7                       12          R2            223    2

Result as followings

OrderTime
ProductsFamily OrderStart OrderEnd      SumSales
R1              2008/1/20  2009/3/20    12102
R2              2008/1/12  2010/7/12    446
R3              2009/1/30  2010/4/23    258

I have no idea how to do it. Any suggestions?

ProductTable <- structure(list(OrderPostingYear = c(2008L, 2008L, 2009L, 2008L, 
2010L, 2009L, 2010L), OrderPostingMonth = c(1L, 1L, 1L, 2L, 4L, 
3L, 7L), OrderPostingDate = c(20L, 12L, 30L, 1L, 23L, 20L, 12L
), ProductsFamily = structure(c(1L, 2L, 3L, 1L, 3L, 1L, 2L), .Label = c("R1", 
"R2", "R3"), class = "factor"), Sales = c(5234L, 223L, 34L, 1634L, 
224L, 5234L, 223L), QTY = c(1L, 2L, 1L, 3L, 1L, 1L, 2L)), .Names = c("OrderPostingYear", 
"OrderPostingMonth", "OrderPostingDate", "ProductsFamily", "Sales", 
"QTY"), class = "data.frame", row.names = c(NA, -7L))

Upvotes: 0

Views: 70

Answers (2)

akrun
akrun

Reputation: 887048

We can also use dplyr/tidyr to do this. We arrange the columns, concatenate the 'Year:Date' columns with unite, group by 'ProductsFamily', get the first, last of 'Date' column and sum of 'Sales' within summarise.

library(dplyr)
library(tidyr)
ProductTable %>% 
   arrange(ProductsFamily, OrderPostingYear, OrderPostingMonth, OrderPostingDate) %>% 
   unite(Date,OrderPostingYear:OrderPostingDate, sep='/') %>% 
   group_by(ProductsFamily) %>%
   summarise(OrderStart=first(Date), OrderEnd=last(Date), SumSales=sum(Sales)) 
# Source: local data frame [3 x 4]

#  ProductsFamily OrderStart  OrderEnd SumSales
#            (fctr)      (chr)     (chr)    (int)   
# 1             R1  2008/1/20 2009/3/20    12102
# 2             R2  2008/1/12 2010/7/12      446
# 3             R3  2009/1/30 2010/4/23      258

Upvotes: 4

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can first set up the date in a new column, and then aggregate your data using data.table package (you take the first and last date by ID, as well as the sum of sales):

library(data.table)

# First build up the date
ProductTable$date = with(ProductTable, 
                         as.Date(paste(OrderPostingYear, 
                                       OrderPostingMonth, 
                                       OrderPostingDate, sep = "." ), 
                                 format = "%Y.%m.%d"))

# In a second step, aggregate your data
setDT(ProductTable)[,list(OrderStart = sort(date)[1],
                          OrderEnd   = sort(date)[.N],
                          SumSales   = sum(Sales))
                    ,ProductsFamily]

#   ProductsFamily OrderStart   OrderEnd SumSales
#1:             R1 2008-01-20 2009-03-20    12102
#2:             R2 2008-01-12 2010-07-12      446
#3:             R3 2009-01-30 2010-04-23      258

Upvotes: 3

Related Questions