Exclude the columns whose column names contain a certain character, and drop the rows that contain NAs in the remaining columns

Question

Suppose I have a dataframe df as below:

          date      CPI:YoY  PPI:MoM
1   2018-01-15           NA       NA
2   2018-01-31          1.2      0.2
3   2018-01-15           NA       NA
4   2018-02-28          0.9     -0.1
5   2018-03-15           NA       NA
6   2018-03-31          1.0      0.3

I need to implement the following two steps with dplyr: select the columns that do not contain date, and then drop the rows where all values in those columns are NAs.

To select columns not containing date can be implemented with the code: dplyr::select(grep("date", names(df))) or dplyr::select(contains("date", ignore.case = TRUE)); to drop NA rows in the selected columns we can use: filter_at(vars(cols), all_vars(!is.na(.))) or dplyr::drop_na(cols).

Now my question is how to combine the two codes into one? Thanks.

Note: For this example, we can simply use drop_na(CPI:YoY, PPI:MoM) to get the expected result, but since in real data, we usually have many columns, and the position of date column doesn't have to be the first column all the time, so I'd prefer to subset the columns by excluding the date column.

The expected result:

          date      CPI:YoY  PPI:MoM
1   2018-01-31          1.2      0.2
2   2018-02-28          0.9     -0.1
3   2018-03-31          1.0      0.3

References:

Subset data to contain only columns whose names match a condition

filtering data frame based on NA on multiple columns

Darren Tsai · Accepted Answer

You could use filter + if_any/if_all:

df %>%
  filter( !if_all(-contains("date"), is.na) )

df %>%
  filter( if_any(-contains("date"), ~ !is.na(.)) )

df %>%
  filter( if_any(-contains("date"), Negate(is.na)) )

Note: Negate(is.na) = function(x) !is.na(x) = ~ !is.na(.)

Output

        date CPI:YoY PPI:MoM
2 2018-01-31     1.2     0.2
4 2018-02-28     0.9    -0.1
6 2018-03-31     1.0     0.3

Data

df <- read.table(text =
"         date      CPI:YoY  PPI:MoM
1   2018-01-15           NA       NA
2   2018-01-31          1.2      0.2
3   2018-01-15           NA       NA
4   2018-02-28          0.9     -0.1
5   2018-03-15           NA       NA
6   2018-03-31          1.0      0.3", check.names = FALSE)

Exclude the columns whose column names contain a certain character, and drop the rows that contain NAs in the remaining columns

Answers (1)

Output

Data

Related Questions