Reputation: 4226
I am trying to count the number of observations by the day of the year. Here are six observations:
six_obs <- data.frame(Date = c("2015-09-06 00:00:12 UTC", "2015-09-06 00:01:47 UTC", "2015-09-06 00:03:30 UTC", "2015-10-06 00:03:31 UTC", "2015-10-06 00:03:36 UTC", "2015-10-06 00:06:18 UTC"), Count = c(6, 4, 5, 4, 5, 7), stringsAsFactors = F)
In order to group them by day of the year, I can do something like the following:
library(dplyr)
library(lubridate)
six_obs %>%
mutate(Date = ymd_hms(Date),
day_of_year = yday(Date)) %>%
group_by(day_of_year) %>%
summarize(number_of_obs = n())
This works fine, but if I have very many dates over multiple years, then this will not straightforwardly work, because the lubridate
function yday
returns an integer between 1
and 365
.
Is there a way to group by the day of the year? One solution is to use the lubridate
functions yday
and year
and then to paste
yday
and year
together, but it seems like there might be a more elegant solution.
Upvotes: 1
Views: 1809
Reputation: 43354
dplyr::count
is equivalent to group_by(...) %>% summarise(n = n())
, so you really only need
six_obs %>% count(day_of_year = date(Date))
## # A tibble: 2 × 2
## day_of_year n
## <date> <int>
## 1 2015-09-06 3
## 2 2015-10-06 3
where lubridate::date
simply converts (or parses, if the Date
column is character) to Date class, mostly equivalent to as.Date
.
Upvotes: 2
Reputation: 5018
You can use the date
function or use round_date(Date, units = "day")
. Alternatively you can just cast it to Date to get rid of the timestamp: (EDIT: Not recommended) . A third option is to use the as.Date(Date)
truncated
argument for the ymd_hms
function.
Upvotes: 3
Reputation: 23210
My understanding from the comments is that you'd like to summarize the data by date.
If you want to sum the counts in Count
by date then we can do so like this:
aggregate(six_obs$Count, by=list(as.Date(six_obs$Date)),sum)
Group.1 x 1 2015-09-06 15 2 2015-10-06 16
or using date()
from lubridate
:
aggregate(six_obs$Count, by=list(date(as.character(six_obs$Date))),sum)
If you only want to sum up the count of the occurrence of the date in the Date
field, ignoring Count
, then just do:
table(as.Date(six_obs$Date))
2015-09-06 2015-10-06 3 3
or
table(date(six_obs$Date))
Upvotes: 1
Reputation: 24089
Another option is to create a sequence of dates and then use the cut command to group and summarize the results. No need to use lubridate.
See this example using created data:
#generate sample data
Date<-seq(from= as.POSIXct("2016-06-01"), by="1 min", length.out = 9000)
value<-rnorm(9000, 50)
df<-data.frame(Date, value)
#group the results by 1 day intervals
library(dplyr)
out<-summarize(group_by(df, cut(df$Date, breaks="1 day") ), n())
Upvotes: 2