Reputation: 13
I am new to programming so do bear with me. I have a data frame with about 1500 rows and 1000 variables. I am trying to keep columns that only have binary values i.e. "0" or "1", NAs are also allowed, but discard all other columns that don't match this criteria. Is there a way of doing this without knowing in advance the column names which meet the criteria?
I have read up on the dplyr filter()
function and also the base R subsetting but none match what I am looking for.
Upvotes: 1
Views: 1550
Reputation: 593
The new features in dplyr
1.0.0 provide a simple solution to this: select(.data, where(is.logical))
. Where .data
is your tibble/data frame (provided your variables are of data type logical
, i.e. TRUE
/FALSE
).
Upvotes: 1
Reputation: 4233
You can try something like this:
df <- data.frame(a=1:5,
b=c(0,1,0,1,0),
c=c(0,1,0,1,NA_real_),
d=c(0,1,0,1,2))
is_binary <- function(x){
all(x %in% c(0,1,NA_real_))
}
df[,sapply(df, is_binary)]
Output:
b c
1 0 0
2 1 1
3 0 0
4 1 1
5 0 NA
Upvotes: 0