Kuwala
Kuwala

Reputation: 13

Keeping only certain columns of a data frame provided they match a condition

I am new to programming so do bear with me. I have a data frame with about 1500 rows and 1000 variables. I am trying to keep columns that only have binary values i.e. "0" or "1", NAs are also allowed, but discard all other columns that don't match this criteria. Is there a way of doing this without knowing in advance the column names which meet the criteria?

I have read up on the dplyr filter() function and also the base R subsetting but none match what I am looking for.

Upvotes: 1

Views: 1550

Answers (2)

knytt
knytt

Reputation: 593

The new features in dplyr 1.0.0 provide a simple solution to this: select(.data, where(is.logical)). Where .data is your tibble/data frame (provided your variables are of data type logical, i.e. TRUE/FALSE).

Upvotes: 1

slava-kohut
slava-kohut

Reputation: 4233

You can try something like this:

df <- data.frame(a=1:5,
                 b=c(0,1,0,1,0),
                 c=c(0,1,0,1,NA_real_),
                 d=c(0,1,0,1,2))

is_binary <- function(x){
  all(x %in% c(0,1,NA_real_))
}

df[,sapply(df, is_binary)]

Output:

  b  c
1 0  0
2 1  1
3 0  0
4 1  1
5 0 NA

Upvotes: 0

Related Questions