timothyylim
timothyylim

Reputation: 1527

Aggregating hours to days in R?

I have the following table:

   date                    status    
1 2015-07-13 12:27:30      1
2 2015-07-22 14:36:09      1
3 2015-07-27 09:03:07      1
4 2015-07-27 17:06:04      1
5 2015-07-28 10:01:38      1

And want to aggregate the number of occurrences by day:

   date            status  sum    
1 2015-07-13       1       1   
2 2015-07-22       1       1
3 2015-07-27       1       2
4 2015-07-28       1       1

Upvotes: 1

Views: 226

Answers (2)

mpalanco
mpalanco

Reputation: 13580

Just for the sake of trying a base solution:

ave and aggregate

df1$sum <- ave(df1$status, as.Date(df1$date), FUN = "sum")
aggregate(df1[-1], list(as.Date(df1$date)), FUN=head, 1)

Output:

     Group.1 status sum
1 2015-07-13      1   1
2 2015-07-22      1   1
3 2015-07-27      1   2
4 2015-07-28      1   1

ave and removing the duplicates after converting the date column

df1$sum <- ave(df1$status, as.Date(df1$date), FUN = "sum")
df1$date <- as.Date(df1$date)
df1[!duplicated(df1$date),]

Output:

        date status sum
1 2015-07-13      1   1
2 2015-07-22      1   1
3 2015-07-27      1   2
5 2015-07-28      1   1

Upvotes: 0

akrun
akrun

Reputation: 887851

Assuming that the 'date' column is POSIXct class, we can use dplyr to aggregate by group. We group by 'date' after converting to Date class and use summarise to select the first observation of 'status' and create the 'sum' column as the number of elements (n()) per each group.

library(dplyr)
df2 <- df1 %>% 
         group_by(date=as.Date(date)) %>%
         summarise(status= first(status), sum= n())
df2
#         date status sum
#1 2015-07-13      1   1
#2 2015-07-22      1   1
#3 2015-07-27      1   2
#4 2015-07-28      1   1

We could also do this using data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by the 'date' column after conversion to Date class, we select the first observation of 'status' and the number of elements (.N) as the 'sum' column

 setDT(df1)[,list(status=status[1L], sum=.N) , by = .(date=as.Date(date))]
 #         date status sum
 #1: 2015-07-13      1   1
 #2: 2015-07-22      1   1
 #3: 2015-07-27      1   2
 #4: 2015-07-28      1   1

data

df1 <- structure(list(date = structure(c(1436804850, 1437590169,
1438002187, 
1438031164, 1438092098), class = c("POSIXct", "POSIXt"), tzone = ""), 
 status = c(1L, 1L, 1L, 1L, 1L)), .Names = c("date", "status"
), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")

Upvotes: 1

Related Questions