Miguel
Miguel

Reputation: 436

Removing NAs within groups across mulitple columns in R

I often work with time series datasets that look like this:

           CNGBOND. FRGBOND.
2006-02-15       NA     3.76
2006-03-31     4.26       NA
2006-05-15       NA     4.25
2006-06-30     4.69       NA
2006-08-15       NA     4.17
2006-09-29     4.08       NA
2006-11-15       NA     4.06
2006-12-29     4.11       NA

I'd like to group the rows into quarters and remove the NAs to get:

        CNGBOND. FRGBOND.
2006 Q1     4.26     3.76
2006 Q2     4.69     4.25
2006 Q3     4.08     4.17
2006 Q4     4.11     4.06

Is there a simple/elegant way to achieve this, preferably using a dplyr pipeline?

I considered converting to quarters (using zoo:as.yearqtr) and then grouping by the quarterly dates (using dplyr::group_by), but then got stuck with how to filter out the NAs for each column within each group.

Upvotes: 1

Views: 38

Answers (2)

Georgery
Georgery

Reputation: 8117

library(tidyverse)
library(lubridate)

df <- tribble(
    ~date, ~CNGBOND, ~FRGBOND
    , "2006-02-15",       NA,     3.76
    , "2006-03-31",     4.26,       NA
    , "2006-05-15",       NA,     4.25
    , "2006-06-30",     4.69,       NA
    , "2006-08-15",       NA,     4.17
    , "2006-09-29",     4.08,       NA
    , "2006-11-15",       NA,     4.06
    , "2006-12-29",     4.11,       NA

)

firstNonNA <- function(x) x[!is.na(x)][1]

df %>%
    mutate(date = ymd(date)) %>%
    mutate(Quarter = quarter(date, with_year = TRUE)) %>%
    group_by(Quarter) %>%
    summarise_at(vars(CNGBOND, FRGBOND), firstNonNA)

Upvotes: 1

tmfmnk
tmfmnk

Reputation: 39858

One solution involving zoo and dplyr could be:

df %>%
 rownames_to_column(var = "Date") %>%
 group_by(Date = as.yearqtr(Date, format = "%Y-%m-%d")) %>%
 summarise_all(~ na.omit(.))

  Date      CNGBOND. FRGBOND.
  <yearqtr>    <dbl>    <dbl>
1 2006 Q1       4.26     3.76
2 2006 Q2       4.69     4.25
3 2006 Q3       4.08     4.17
4 2006 Q4       4.11     4.06

Upvotes: 4

Related Questions