John
John

Reputation: 1947

How to check if R dataframe contains only 1's or 0's

I'm wondering about most efficient way to check weather data frame contains only 1's or 0's or not. I came up with I would say very intuitive but inefficient idea.

My idea

Is to convert whole data frame to be a vector. Then check if length of that vector equals sum of all column lengths with conditions ==0 or ==1.

Example

> df<-data.frame(sample(0:1,10,replace=T),sample(0:1,10,replace=T),sample(0:1,10,replace=T))
> df
   sample.0.1..10..replace...T. sample.0.1..10..replace...T..1 sample.0.1..10..replace...T..2
1                             0                              1                              1
2                             0                              0                              1
3                             1                              1                              0
4                             0                              0                              0
5                             1                              0                              1
6                             0                              0                              0
7                             1                              0                              0
8                             0                              0                              0
9                             0                              1                              0
10                            1                              0                              1



 length(unlist(df,use.names=F))
 30
 length(df[,1][df[,1]==0])+length(df[,1][df[,1]==1])+length(df[,2][df[,2]==0])+length(df[,2] 
 [df[,2]==1])+length(df[,3][df[,3]==0])+length(df[,3][df[,3]==1])
 30

Is there any faster way how to do it ?

Upvotes: 1

Views: 1595

Answers (2)

jay.sf
jay.sf

Reputation: 72803

You could go for the table function and check if all the names are %in% 0:1. When you want to take account of missings use argument use.na=, omit it otherwise.

This is how it looks like:

table(unlist(dat.m), useNA="ifany")
#  0    1 <NA> 
# 52   73   10 

In action:

all(names(table(unlist(dat), useNA="ifany")) %in% 0:1)
# [1] TRUE
all(names(table(unlist(dat.m), useNA="ifany")) %in% 0:1)
# [1] FALSE
all(names(table(unlist(dat.99), useNA="ifany")) %in% 0:1)
# [1] FALSE

Data:

m <- 15;n <- 9
set.seed(42)
M <- matrix(rbinom(m*n, 1, .5), m, n)

## clean
dat <- as.data.frame(M)

## with missings
M[as.logical(rbinom(length(M), 1, .1))] <- NA
dat.na <- as.data.frame(M)

## with other numbers
M[as.logical(rbinom(length(M), 1, .1))] <- -99
dat.99 <- as.data.frame(M)

Upvotes: 2

gaut
gaut

Reputation: 5958

Normally I would have gone for

all(sapply(df, function(x) all(x) %in% c(0,1)))
[1] TRUE

However, you need to be aware that R will coerce booleans to numeric when evaluating these conditions. This could lead to returning TRUE even for boolean values. For example, the previous statement returns TRUE for

test <- c(TRUE, TRUE, FALSE)

Therefore, this solution needs to be modified to check for numerical values which is what you are after.

all(sapply(test, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] FALSE
all(sapply(df, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] TRUE

EDIT: it also works with data that includes NA's, with a warning.

df_missing <- df
df_missing$nacol <- c(rep(1,9),NA)
all(sapply(df_missing, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] FALSE
Warning message:
In all(x) : coercing argument of type 'double' to logical

Upvotes: 2

Related Questions