thecheech
thecheech

Reputation: 2211

Summing up a data frame according to a column in R

I am trying to present the following data frame per sum of each city:

> summary(dat1)
      Date                 City           Sales        
 Min.   :2010-06-18   Min.   : 1.00   Min.   :  667.4  
 1st Qu.:2011-02-18   1st Qu.:18.00   1st Qu.: 1138.6  
 Median :2011-10-28   Median :37.00   Median : 1507.5  
 Mean   :2011-10-29   Mean   :44.26   Mean   : 2065.4  
 3rd Qu.:2012-07-06   3rd Qu.:74.00   3rd Qu.: 2347.1  
 Max.   :2013-03-08   Max.   :99.00   Max.   :47206.6 

i.e. I would like to find the data frame with corresponding Date X City observations that will present the sum of sales for each city in on each Day.

Upvotes: 0

Views: 78

Answers (2)

Maciej
Maciej

Reputation: 3303

See aggregation function

aggregate(Sales~Date+City, data=dat1, sum)

Upvotes: 0

user2357031
user2357031

Reputation:

There are several possibilities for this. To name a few:

  1. Function aggregate():

    i) aggregate(Sales~Date+City, data=df, sum)

    ii) aggregate(df$Sales, list(df$Date,df$City), sum)

  2. Function tapply():

    i) tapply(df$Sales, list(df$Date, df$City), sum)

Function tapply() is especially useful, if you have a large dataset, since aggregate tends to choke on very large data sets, but tapply() typically handles these more gracefully. Also, tapply() and aggregate()generate the output in different formats, and you might want to select the one that is most appropriate for possible furter analyses.

These examples can be tested on a simulated data that is presented below:

df<-structure(list(Date = structure(c(4L, 2L, 4L, 2L, 3L, 4L, 3L, 
2L, 2L, 2L, 2L, 4L, 1L, 4L, 2L, 4L, 2L, 3L, 4L, 2L, 3L, 3L, 4L, 
3L, 4L, 2L, 2L, 2L, 3L, 1L, 1L, 4L, 2L, 4L, 1L, 2L, 1L, 2L, 3L, 
2L, 2L, 3L, 2L, 1L, 1L, 3L, 2L, 1L, 1L, 3L, 3L, 1L, 3L, 1L, 1L, 
1L, 3L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 4L, 2L, 1L, 3L, 3L, 1L, 4L, 
1L, 2L, 2L, 1L, 2L, 2L, 2L), .Label = c("2014-01-01", "2014-02-01", 
"2014-03-01", "2014-04-01"), class = "factor"), City = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 
18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L), .Label = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", 
"o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"), class = "factor"), 
    Sales = c(100, 100, 93, 92, 95, 115, 104, 106, 113, 94, 93, 
    98, 116, 85, 98, 97, 103, 110, 105, 104, 107, 86, 92, 94, 
    106, 115, 112, 92, 103, 100, 101, 97, 95, 110, 103, 92, 91, 
    98, 100, 93, 108, 87, 96, 101, 87, 111, 90, 94, 110, 95, 
    110, 101, 88, 99, 106, 117, 101, 120, 92, 86, 118, 104, 99, 
    89, 103, 102, 121, 99, 106, 99, 107, 105, 109, 110, 112, 
    94, 100, 112)), .Names = c("Date", "City", "Sales"), row.names = c(NA, 
    -78L), class = "data.frame")

Upvotes: 1

Related Questions