Reputation: 1947
I'm wondering about most efficient way to check weather data frame contains only 1's or 0's or not. I came up with I would say very intuitive but inefficient idea.
My idea
Is to convert whole data frame to be a vector. Then check if length of that vector equals sum of all column lengths with conditions ==0 or ==1.
Example
> df<-data.frame(sample(0:1,10,replace=T),sample(0:1,10,replace=T),sample(0:1,10,replace=T))
> df
sample.0.1..10..replace...T. sample.0.1..10..replace...T..1 sample.0.1..10..replace...T..2
1 0 1 1
2 0 0 1
3 1 1 0
4 0 0 0
5 1 0 1
6 0 0 0
7 1 0 0
8 0 0 0
9 0 1 0
10 1 0 1
length(unlist(df,use.names=F))
30
length(df[,1][df[,1]==0])+length(df[,1][df[,1]==1])+length(df[,2][df[,2]==0])+length(df[,2]
[df[,2]==1])+length(df[,3][df[,3]==0])+length(df[,3][df[,3]==1])
30
Is there any faster way how to do it ?
Upvotes: 1
Views: 1595
Reputation: 72803
You could go for the table
function and check if all
the names
are %in% 0:1
. When you want to take account of missings use argument use.na=
, omit it otherwise.
This is how it looks like:
table(unlist(dat.m), useNA="ifany")
# 0 1 <NA>
# 52 73 10
In action:
all(names(table(unlist(dat), useNA="ifany")) %in% 0:1)
# [1] TRUE
all(names(table(unlist(dat.m), useNA="ifany")) %in% 0:1)
# [1] FALSE
all(names(table(unlist(dat.99), useNA="ifany")) %in% 0:1)
# [1] FALSE
Data:
m <- 15;n <- 9
set.seed(42)
M <- matrix(rbinom(m*n, 1, .5), m, n)
## clean
dat <- as.data.frame(M)
## with missings
M[as.logical(rbinom(length(M), 1, .1))] <- NA
dat.na <- as.data.frame(M)
## with other numbers
M[as.logical(rbinom(length(M), 1, .1))] <- -99
dat.99 <- as.data.frame(M)
Upvotes: 2
Reputation: 5958
Normally I would have gone for
all(sapply(df, function(x) all(x) %in% c(0,1)))
[1] TRUE
However, you need to be aware that R will coerce booleans to numeric when evaluating these conditions. This could lead to returning TRUE
even for boolean values. For example, the previous statement returns TRUE
for
test <- c(TRUE, TRUE, FALSE)
Therefore, this solution needs to be modified to check for numerical values which is what you are after.
all(sapply(test, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] FALSE
all(sapply(df, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] TRUE
EDIT: it also works with data that includes NA
's, with a warning.
df_missing <- df
df_missing$nacol <- c(rep(1,9),NA)
all(sapply(df_missing, function(x) is.numeric(x) & all(x) %in% c(0,1)))
[1] FALSE
Warning message:
In all(x) : coercing argument of type 'double' to logical
Upvotes: 2