Reputation: 2211
I am trying to present the following data frame per sum of each city:
> summary(dat1)
Date City Sales
Min. :2010-06-18 Min. : 1.00 Min. : 667.4
1st Qu.:2011-02-18 1st Qu.:18.00 1st Qu.: 1138.6
Median :2011-10-28 Median :37.00 Median : 1507.5
Mean :2011-10-29 Mean :44.26 Mean : 2065.4
3rd Qu.:2012-07-06 3rd Qu.:74.00 3rd Qu.: 2347.1
Max. :2013-03-08 Max. :99.00 Max. :47206.6
i.e. I would like to find the data frame with corresponding Date X City observations that will present the sum of sales for each city in on each Day.
Upvotes: 0
Views: 78
Reputation: 3303
See aggregation
function
aggregate(Sales~Date+City, data=dat1, sum)
Upvotes: 0
Reputation:
There are several possibilities for this. To name a few:
Function aggregate():
i) aggregate(Sales~Date+City, data=df, sum)
ii) aggregate(df$Sales, list(df$Date,df$City), sum)
Function tapply():
i) tapply(df$Sales, list(df$Date, df$City), sum)
Function tapply()
is especially useful, if you have a large dataset, since aggregate tends to choke on very large data sets, but tapply()
typically handles these more gracefully. Also, tapply()
and aggregate()
generate the output in different formats, and you might want to select the one that is most appropriate for possible furter analyses.
These examples can be tested on a simulated data that is presented below:
df<-structure(list(Date = structure(c(4L, 2L, 4L, 2L, 3L, 4L, 3L,
2L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 2L, 3L, 4L, 2L, 3L, 3L, 4L,
3L, 4L, 2L, 2L, 2L, 3L, 1L, 1L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 3L,
2L, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L,
1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 4L, 2L, 1L, 3L, 3L, 1L, 4L,
1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("2014-01-01", "2014-02-01",
"2014-03-01", "2014-04-01"), class = "factor"), City = structure(c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L,
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L,
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"),
Sales = c(100, 100, 93, 92, 95, 115, 104, 106, 113, 94, 93,
98, 116, 85, 98, 97, 103, 110, 105, 104, 107, 86, 92, 94,
106, 115, 112, 92, 103, 100, 101, 97, 95, 110, 103, 92, 91,
98, 100, 93, 108, 87, 96, 101, 87, 111, 90, 94, 110, 95,
110, 101, 88, 99, 106, 117, 101, 120, 92, 86, 118, 104, 99,
89, 103, 102, 121, 99, 106, 99, 107, 105, 109, 110, 112,
94, 100, 112)), .Names = c("Date", "City", "Sales"), row.names = c(NA,
-78L), class = "data.frame")
Upvotes: 1