Egor Ignatenkov
Egor Ignatenkov

Reputation: 1322

R: subsetting from data.frame by fixing value of many columns

I have a dataframe with 24 columns of zeroes and ones. I want to subset those rows that have same values in the first 12 columns as the first row. How can I do it without typing twelve conditions explicitly?

I can only think of something like

subs<-huge[huge[,1:12]==huge[1,1:12],]

but that's not working.

Error in Ops.data.frame(huge[, 1:12], huge[1, 1:12]) : ‘==’ only defined for equally-sized data frames

Upvotes: 0

Views: 2253

Answers (1)

akrun
akrun

Reputation: 887098

As the error says, the datasets compared were not equally-sized.

We can make it equal by replicating each element of the first row (i.e. 1:12) by the number of columns of huge[1:12]. Here, I am using col(huge[1:12])] to do that task. We could also use ?rep. After the replication step, we can get the logical index of non-matching elements (!=), get the sum by row (rowSums). Values of '0' will be matching all the elements. Negate that (!rowSums) to convert the '0' values to 'TRUE' and subset the dataset.

 huge[!rowSums(huge[1:12]!= huge[1,1:12][col(huge[1:12])]),]
 #  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
#1  0  0  0  1  0  1  1  1  0   0   0   0   0   0   0   0   0   1   1   1  1
 #15  0  0  0  1  0  1  1  1  0   0   0   0   1   1   1   0   0   1   0   0  1
 #39  0  0  0  1  0  1  1  1  0   0   0   0   1   0   1   0   0   1   0   0  1
#   V22 V23 V24
#1    1   1   1
#15   1   0   1
#39   0   1   0

data

 set.seed(353)
 huge <- as.data.frame(matrix(sample(0:1, 24*60, replace=TRUE), ncol=24))
 huge[c(15,39),1:12] <- huge[1, 1:12] 

Upvotes: 3

Related Questions