Reputation: 895
This is the small sampleData of half million rows,here product,date and salesamount is a factor variable. I need to divide the data by year 2011 and 2012 and display the product with its total sum of the sales amount in that year. I tried a lot at last i did it in excel using pivot table. but i want know that how this can be achived in R, as i don't want more excel files. Thankyou
product date salesamount
a 2011-01 35
b 2011-02 40
c 2011-03 50
d 2011-01 55
b 2012-02 46
a 2011-02 35
d 2012-01 62
c 2012-03 56
c 2012-02 56
a 2012-03 38
b 2012-01 46
e 2011-03 60
a 2012-03 38
e 2012-02 67
d 2011-01 55
Data Should Look something like this after dividing it by year
product year2011 year2012
a 70 76
b 40 92
c 50 112
d 110 62
e 60 67
Upvotes: 1
Views: 67
Reputation: 263391
Assuming the dat object is dat
then the base tapply
approach would be:
with( dat, tapply( salesamount, list( product, substring(date, 1,4) ), sum) )
2011 2012
a 70 76
b 40 92
c 50 112
d 110 62
e 60 67
Upvotes: 0
Reputation: 12569
Here is a solution with data.table
:
library("data.table")
DT[, year:=substr(date, 1, 4)]
dcast(DT, product ~ year, value.var="salesamount", fun.aggregate = sum)
# > dcast(DT, product ~ year, value.var="salesamount", fun.aggregate = sum)
# product 2011 2012
# 1: a 70 76
# 2: b 40 92
# 3: c 50 112
# 4: d 110 62
# 5: e 60 67
Data:
DT <- fread(
" product date salesamount
a 2011-01 35
b 2011-02 40
c 2011-03 50
d 2011-01 55
b 2012-02 46
a 2011-02 35
d 2012-01 62
c 2012-03 56
c 2012-02 56
a 2012-03 38
b 2012-01 46
e 2011-03 60
a 2012-03 38
e 2012-02 67
d 2011-01 55")
Upvotes: 1
Reputation: 35352
.
library(dplyr)
library(tidyr)
df %>%
separate(date, c('year', 'month')) %>%
group_by(product, year) %>%
summarise(salesamount = sum(salesamount)) %>%
spread(year, salesamount, sep = '')
# A tibble: 5 x 3 # Groups: product [5] product year2011 year2012 <chr> <int> <int> 1 a 70 76 2 b 40 92 3 c 50 112 4 d 110 62 5 e 60 67
Upvotes: 2