Reputation: 384
I have made a dataframe which has a column with dates and columns with numeric values. I want this dataframe to group itself by month and summerize all the numeric values from the other columns per corresponding month.
Here is my dataframe example:
capture.date Test1 Test2 Test3
2016-03-18 0 1 1
2016-03-18 1 1 1
2016-03-20 2 1 1
2016-04-12 1 0 1
I already tried some code:
df %>%
group_by(capture.date) %>%
summarise_each(funs(sum))
and:
aggregate(df[2:4], by=df["capture.date"], sum)
but both of these options return dataframes which summarize by daily date instead of month. How can I make it summarize by month instead of by day?
desired output:
capture.date Test1 Test2 Test3
2016-03 3 3 3
2016-04 1 0 1
Upvotes: 5
Views: 1632
Reputation: 35554
You can extract dates into the %Y-%m
format in group_by()
and use summarise_if()
or summarise_at()
to select which variables get be summed.
(Confirm that capture.date
is Date
class)
df %>%
group_by(Date = strftime(capture.date, "%Y-%m")) %>%
summarise_if(is.numeric, sum)
# # A tibble: 2 x 4
# Date Test1 Test2 Test3
# <chr> <int> <int> <int>
# 1 2016-03 3 3 3
# 2 2016-04 1 0 1
Scoped verbs (_if
, _at
, _all
) have been superseded by the use of pick()
or across()
in an existing verb.
df %>%
group_by(Date = strftime(capture.date, "%Y-%m")) %>%
summarise(across(where(is.numeric), sum))
Upvotes: 4
Reputation: 269441
1) dplyr/zoo Using the data shown reproducibly in the Note at the end convert each date to yearmon class which represents dates having no day and then summrize the numeric columns:
library(dplyr)
library(zoo)
df %>%
group_by(yearmon = as.yearmon(capture.date)) %>%
summarize_if(is.numeric, sum) %>%
ungroup
giving this tibble:
# A tibble: 2 x 4
yearmon Test1 Test2 Test3
<yearmon> <int> <int> <int>
1 Mar 2016 3 3 3
2 Apr 2016 1 0 1
2) zoo This could alternately be done in a singhle read.zoo
command. fortify.zoo
could be used on the result if you want a data.frame as the result:
library(zoo)
read.zoo(df, FUN = as.yearmon, aggregate = sum)
giving this zoo series:
Test1 Test2 Test3
Mar 2016 3 3 3
Apr 2016 1 0 1
2a) zoo with magrittr pipeline This could alternately be written as this pipeline with magrittr (or dplyr) pipeline:
library(magrittr)
library(zoo)
df %>% read.zoo(FUN = as.yearmon, aggregate = sum)
or to convert to data.frame
library(magrittr)
library(zoo)
df %>% read.zoo(FUN = as.yearmon, aggregate = sum) %>% fortify.zoo
3) Base R Using only Base R extract the first 7 characters of each date and then aggregate on that:
df2 <- transform(df, year.month = substr(capture.date, 1, 7), capture.date = NULL)
aggregate(. ~ year.month, df2, sum)
giving this data.frame:
year.month Test1 Test2 Test3
1 2016-03 3 3 3
2 2016-04 1 0 1
The input in reproducible form:
Lines <- "
capture.date Test1 Test2 Test3
2016-03-18 0 1 1
2016-03-18 1 1 1
2016-03-20 2 1 1
2016-04-12 1 0 1"
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
Upvotes: 3
Reputation: 2881
The following should work
library(lubridate)
library(tidyverse)
txt <- "capture.date Test1 Test2 Test3
2016-03-18 0 1 1
2016-03-18 1 1 1
2016-03-20 2 1 1
2016-04-12 1 0 1"
data <- read.table(text = txt, header = TRUE)
data %>%
mutate(month = month(capture.date),
year = year(capture.date)) %>%
group_by(month, year) %>%
summarise_if(is.integer, sum) %>%
ungroup %>%
mutate("capture.date" = paste(year, str_pad(month, 2, side = "left", pad = "0"), sep = "-")) %>%
select(capture.date, Test1, Test2, Test3)
This will produce
# A tibble: 2 x 4
capture.date Test1 Test2 Test3
<chr> <int> <int> <int>
1 2016-03 3 3 3
2 2016-04 1 0 1
You may need to change the function in summarise_if
to something else than is.integer
for your real data.
Upvotes: 3