Joshua Rosenberg
Joshua Rosenberg

Reputation: 4226

Count observations by day of year using lubridate in R

I am trying to count the number of observations by the day of the year. Here are six observations:

six_obs <- data.frame(Date = c("2015-09-06 00:00:12 UTC", "2015-09-06 00:01:47 UTC", "2015-09-06 00:03:30 UTC", "2015-10-06 00:03:31 UTC", "2015-10-06 00:03:36 UTC", "2015-10-06 00:06:18 UTC"), Count = c(6, 4, 5, 4, 5, 7), stringsAsFactors = F)

In order to group them by day of the year, I can do something like the following:

library(dplyr)
library(lubridate)

six_obs %>%
    mutate(Date = ymd_hms(Date),
           day_of_year = yday(Date)) %>%
    group_by(day_of_year) %>%
    summarize(number_of_obs = n())

This works fine, but if I have very many dates over multiple years, then this will not straightforwardly work, because the lubridate function yday returns an integer between 1 and 365.

Is there a way to group by the day of the year? One solution is to use the lubridate functions yday and year and then to paste yday and year together, but it seems like there might be a more elegant solution.

Upvotes: 1

Views: 1809

Answers (4)

alistaire
alistaire

Reputation: 43354

dplyr::count is equivalent to group_by(...) %>% summarise(n = n()), so you really only need

six_obs %>% count(day_of_year = date(Date))

## # A tibble: 2 × 2
##   day_of_year     n
##        <date> <int>
## 1  2015-09-06     3
## 2  2015-10-06     3

where lubridate::date simply converts (or parses, if the Date column is character) to Date class, mostly equivalent to as.Date.

Upvotes: 2

yeedle
yeedle

Reputation: 5018

You can use the date function or use round_date(Date, units = "day"). Alternatively you can just cast it to Date to get rid of the timestamp: as.Date(Date)(EDIT: Not recommended) . A third option is to use the truncated argument for the ymd_hms function.

Upvotes: 3

Hack-R
Hack-R

Reputation: 23210

My understanding from the comments is that you'd like to summarize the data by date.

If you want to sum the counts in Count by date then we can do so like this:

aggregate(six_obs$Count, by=list(as.Date(six_obs$Date)),sum)
     Group.1  x
1 2015-09-06 15
2 2015-10-06 16

or using date() from lubridate:

aggregate(six_obs$Count, by=list(date(as.character(six_obs$Date))),sum)

If you only want to sum up the count of the occurrence of the date in the Date field, ignoring Count, then just do:

table(as.Date(six_obs$Date))
2015-09-06 2015-10-06 
         3          3

or

table(date(six_obs$Date))

Upvotes: 1

Dave2e
Dave2e

Reputation: 24089

Another option is to create a sequence of dates and then use the cut command to group and summarize the results. No need to use lubridate.
See this example using created data:

#generate sample data
Date<-seq(from= as.POSIXct("2016-06-01"), by="1 min", length.out = 9000)
value<-rnorm(9000, 50)
df<-data.frame(Date, value)


#group the results by 1 day intervals
library(dplyr)
out<-summarize(group_by(df, cut(df$Date, breaks="1 day") ), n())

Upvotes: 2

Related Questions