Reputation: 1322
I have a dataframe with 24 columns of zeroes and ones. I want to subset those rows that have same values in the first 12 columns as the first row. How can I do it without typing twelve conditions explicitly?
I can only think of something like
subs<-huge[huge[,1:12]==huge[1,1:12],]
but that's not working.
Error in Ops.data.frame(huge[, 1:12], huge[1, 1:12]) : ‘==’ only defined for equally-sized data frames
Upvotes: 0
Views: 2253
Reputation: 887098
As the error says, the datasets compared were not equally-sized.
We can make it equal by replicating each element of the first row (i.e. 1:12) by the number of columns of huge[1:12]
. Here, I am using col(huge[1:12])]
to do that task. We could also use ?rep
. After the replication step, we can get the logical index of non-matching elements (!=
), get the sum by row (rowSums
). Values of '0' will be matching all the elements. Negate that (!rowSums
) to convert the '0' values to 'TRUE' and subset the dataset.
huge[!rowSums(huge[1:12]!= huge[1,1:12][col(huge[1:12])]),]
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
#1 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1
#15 0 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 0 1 0 0 1
#39 0 0 0 1 0 1 1 1 0 0 0 0 1 0 1 0 0 1 0 0 1
# V22 V23 V24
#1 1 1 1
#15 1 0 1
#39 0 1 0
set.seed(353)
huge <- as.data.frame(matrix(sample(0:1, 24*60, replace=TRUE), ncol=24))
huge[c(15,39),1:12] <- huge[1, 1:12]
Upvotes: 3