ah bon
ah bon

Reputation: 10011

Exclude the columns whose column names contain a certain character, and drop the rows that contain NAs in the remaining columns

Suppose I have a dataframe df as below:

          date      CPI:YoY  PPI:MoM
1   2018-01-15           NA       NA
2   2018-01-31          1.2      0.2
3   2018-01-15           NA       NA
4   2018-02-28          0.9     -0.1
5   2018-03-15           NA       NA
6   2018-03-31          1.0      0.3

I need to implement the following two steps with dplyr: select the columns that do not contain date, and then drop the rows where all values in those columns are NAs.

To select columns not containing date can be implemented with the code: dplyr::select(grep("date", names(df))) or dplyr::select(contains("date", ignore.case = TRUE)); to drop NA rows in the selected columns we can use: filter_at(vars(cols), all_vars(!is.na(.))) or dplyr::drop_na(cols).

Now my question is how to combine the two codes into one? Thanks.

Note: For this example, we can simply use drop_na(CPI:YoY, PPI:MoM) to get the expected result, but since in real data, we usually have many columns, and the position of date column doesn't have to be the first column all the time, so I'd prefer to subset the columns by excluding the date column.

The expected result:

          date      CPI:YoY  PPI:MoM
1   2018-01-31          1.2      0.2
2   2018-02-28          0.9     -0.1
3   2018-03-31          1.0      0.3

References:

Subset data to contain only columns whose names match a condition

filtering data frame based on NA on multiple columns

Upvotes: 2

Views: 927

Answers (1)

Darren Tsai
Darren Tsai

Reputation: 35554

You could use filter + if_any/if_all:

df %>%
  filter( !if_all(-contains("date"), is.na) )
df %>%
  filter( if_any(-contains("date"), ~ !is.na(.)) )
df %>%
  filter( if_any(-contains("date"), Negate(is.na)) )

Note: Negate(is.na) = function(x) !is.na(x) = ~ !is.na(.)

Output
        date CPI:YoY PPI:MoM
2 2018-01-31     1.2     0.2
4 2018-02-28     0.9    -0.1
6 2018-03-31     1.0     0.3
Data
df <- read.table(text =
"         date      CPI:YoY  PPI:MoM
1   2018-01-15           NA       NA
2   2018-01-31          1.2      0.2
3   2018-01-15           NA       NA
4   2018-02-28          0.9     -0.1
5   2018-03-15           NA       NA
6   2018-03-31          1.0      0.3", check.names = FALSE)

Upvotes: 4

Related Questions