Reputation: 10011
Suppose I have a dataframe df
as below:
date CPI:YoY PPI:MoM
1 2018-01-15 NA NA
2 2018-01-31 1.2 0.2
3 2018-01-15 NA NA
4 2018-02-28 0.9 -0.1
5 2018-03-15 NA NA
6 2018-03-31 1.0 0.3
I need to implement the following two steps with dplyr
: select the columns that do not contain date
, and then drop the rows where all values in those columns are NA
s.
To select columns not containing date
can be implemented with the code: dplyr::select(grep("date", names(df)))
or dplyr::select(contains("date", ignore.case = TRUE))
; to drop NA rows in the selected columns we can use: filter_at(vars(cols), all_vars(!is.na(.)))
or dplyr::drop_na(cols)
.
Now my question is how to combine the two codes into one? Thanks.
Note: For this example, we can simply use drop_na(CPI:YoY, PPI:MoM)
to get the expected result, but since in real data, we usually have many columns, and the position of date
column doesn't have to be the first column all the time, so I'd prefer to subset the columns by excluding the date
column.
The expected result:
date CPI:YoY PPI:MoM
1 2018-01-31 1.2 0.2
2 2018-02-28 0.9 -0.1
3 2018-03-31 1.0 0.3
References:
Subset data to contain only columns whose names match a condition
filtering data frame based on NA on multiple columns
Upvotes: 2
Views: 927
Reputation: 35554
You could use filter
+ if_any/if_all
:
df %>%
filter( !if_all(-contains("date"), is.na) )
df %>%
filter( if_any(-contains("date"), ~ !is.na(.)) )
df %>%
filter( if_any(-contains("date"), Negate(is.na)) )
Note: Negate(is.na)
= function(x) !is.na(x)
= ~ !is.na(.)
date CPI:YoY PPI:MoM
2 2018-01-31 1.2 0.2
4 2018-02-28 0.9 -0.1
6 2018-03-31 1.0 0.3
df <- read.table(text =
" date CPI:YoY PPI:MoM
1 2018-01-15 NA NA
2 2018-01-31 1.2 0.2
3 2018-01-15 NA NA
4 2018-02-28 0.9 -0.1
5 2018-03-15 NA NA
6 2018-03-31 1.0 0.3", check.names = FALSE)
Upvotes: 4