Reputation: 436
I often work with time series datasets that look like this:
CNGBOND. FRGBOND.
2006-02-15 NA 3.76
2006-03-31 4.26 NA
2006-05-15 NA 4.25
2006-06-30 4.69 NA
2006-08-15 NA 4.17
2006-09-29 4.08 NA
2006-11-15 NA 4.06
2006-12-29 4.11 NA
I'd like to group the rows into quarters and remove the NAs to get:
CNGBOND. FRGBOND.
2006 Q1 4.26 3.76
2006 Q2 4.69 4.25
2006 Q3 4.08 4.17
2006 Q4 4.11 4.06
Is there a simple/elegant way to achieve this, preferably using a dplyr
pipeline?
I considered converting to quarters (using zoo:as.yearqtr
) and then grouping by the quarterly dates (using dplyr::group_by
), but then got stuck with how to filter out the NAs for each column within each group.
Upvotes: 1
Views: 38
Reputation: 8117
library(tidyverse)
library(lubridate)
df <- tribble(
~date, ~CNGBOND, ~FRGBOND
, "2006-02-15", NA, 3.76
, "2006-03-31", 4.26, NA
, "2006-05-15", NA, 4.25
, "2006-06-30", 4.69, NA
, "2006-08-15", NA, 4.17
, "2006-09-29", 4.08, NA
, "2006-11-15", NA, 4.06
, "2006-12-29", 4.11, NA
)
firstNonNA <- function(x) x[!is.na(x)][1]
df %>%
mutate(date = ymd(date)) %>%
mutate(Quarter = quarter(date, with_year = TRUE)) %>%
group_by(Quarter) %>%
summarise_at(vars(CNGBOND, FRGBOND), firstNonNA)
Upvotes: 1
Reputation: 39858
One solution involving zoo
and dplyr
could be:
df %>%
rownames_to_column(var = "Date") %>%
group_by(Date = as.yearqtr(Date, format = "%Y-%m-%d")) %>%
summarise_all(~ na.omit(.))
Date CNGBOND. FRGBOND.
<yearqtr> <dbl> <dbl>
1 2006 Q1 4.26 3.76
2 2006 Q2 4.69 4.25
3 2006 Q3 4.08 4.17
4 2006 Q4 4.11 4.06
Upvotes: 4