Jenny Pham
Jenny Pham

Reputation: 181

Sum by Date and rows in R

I have table as follows:

dt<-data.frame(Date=c("2011-01-16","2011-01-16","2011-07-08","2011-07-09","2011-07-09","2011-08-17","2011-09-10","2011-09-11","2011-09-11"),Number=c(7,7,NA,1,1,NA,7,5,6),Hour=c(0.25,0.25,NA,0.6,0.6,NA,2,0.25,0.25))
        Date Number Hour
1 2011-01-16      7 0.25
2 2011-01-16      7 0.25
3 2011-07-08     NA   NA
4 2011-07-09      1 0.60
5 2011-07-09      1 0.60
6 2011-08-17     NA   NA
7 2011-09-10      7 2.00
8 2011-09-11      5 0.25
9 2011-09-11      6 0.25

I want to sum Hour by Number and Date. The output looks like this:

      Date    "1"  "5" "6"  "7"
1 2011-01-16  NA   NA   NA 0.5
2 2011-07-08  NA   NA   NA  NA
3 2011-07-09 1.2   NA   NA  NA
4 2011-08-17  NA   NA   NA  NA
5 2011-09-10  NA   NA   NA 2.0
6 2011-09-11  NA 0.25 0.25  NA

Can you suggest me a function to get the output?

Upvotes: 1

Views: 943

Answers (3)

akrun
akrun

Reputation: 887881

We can make use of the fun.aggregate from dcast

library(data.table)
dcast(setDT(dt), Date + Hour ~ Number, sum)

If the OP intended to get NA if there are no combinations, then create a condition because sum returns 0 for length 0 (sum(integer(0)))

dcast(setDT(dt), Date + Hour ~ Number, function(x) 
    if(length(x) == 0) NA_real_ else sum(x, na.rm = TRUE))[,
          .(Date, Hour, `1`, `5`, `6`, `7`)]
#.       Date Hour   1    5    6   7
#1: 2011-01-16 0.25  NA   NA   NA 0.5
#2: 2011-07-08   NA  NA   NA   NA  NA
#3: 2011-07-09 0.60 1.2   NA   NA  NA
#4: 2011-08-17   NA  NA   NA   NA  NA
#5: 2011-09-10 2.00  NA   NA   NA 2.0
#6: 2011-09-11 0.25  NA 0.25 0.25  NA

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 389265

We can group_by Date and Number and sum Hour for each group and use spread to change it to wide format. However, this also gives NA column (since Number has NA value) which can be removed if not needed.

library(dplyr)

dt %>%
  group_by(Date, Number) %>%
  summarise(Hour = sum(Hour, na.rm = TRUE)) %>%
  tidyr::spread(Number, Hour) %>%
  select(-`<NA>`)

#  Date         `1`   `5`   `6`   `7`
#  <fct>      <dbl> <dbl> <dbl> <dbl>
#1 2011-01-16  NA   NA    NA      0.5
#2 2011-07-08  NA   NA    NA     NA  
#3 2011-07-09  1.2  NA    NA     NA  
#4 2011-08-17  NA   NA    NA     NA  
#5 2011-09-10  NA   NA    NA      2  
#6 2011-09-11  NA   0.25  0.25  NA  

Upvotes: 1

Neeraj
Neeraj

Reputation: 1236

You can use aggregate function for this.

dt$Date <- as.character(dt$Date)
aggregate(dt$Hour, by = list(dt$Number, dt$Date), FUN = function(x) sum(x, na.rm = T))

Or, you can use this (without removing NA this time):

with(dt, aggregate(Hour, by = list(Number, Date), FUN = sum))

Upvotes: 2

Related Questions