Ryan Shocker
Ryan Shocker

Reputation: 703

Removing Row Instances of a Data.Frame based on Column Value of Another Data.Frame

Background

I have a NxM data.frame MATRIX_1 in R containing a series of values. In addition to this, I have another NxM data.frame MATRIX_2 that contains a 1:1 mapping to the first, but instead of numerical values, they are booleans to tell if that data point falls outside 2 standard deviations from the mean of that particular column.

Goal

I'm wanting to remove all rows from my MATRIX_1 in which the corresponding [row, col] in MATRIX_2 contains a TRUE value.

Example

MATRIX_2
AGE   SEX   BMI    BP    S1    S2    S3    S4    S5    S6     Y PROGRESSION
[1,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE       FALSE

Above, the BMI column has a TRUE value in the column. Therefore, this entire row should be removed from the MATRIX_1 where MATRIX_1 looks something like the following:

MATRIX_1
    AGE SEX  BMI     BP  S1    S2   S3   S4     S5  S6   Y PROGRESSION
1    59   2 32.1 101.00 157  93.2 38.0 4.00 4.8598  87 151           1

Attempt

I've seen some of the following using the %in% operator, but want this to auto apply to all columns, whereas something like df1[!(df1$name %in% df2$name),] targets specifically a singular column in the frame.

I'm getting almost successful using subset

subset(diabetes2, boolean_diabetes2[,1] == TRUE)

Upvotes: 1

Views: 46

Answers (1)

lukeA
lukeA

Reputation: 54237

To select all rows from MATRIX_1, where the corresponding rows in MATRIX_2 contains all FALSE values, you could do:

# sample data    
set.seed(1)
MATRIX_2 <- matrix(sample(c(T,F), 3*4, T, prob = c(.3,.7)), ncol=3)
MATRIX_1 <-  as.data.frame(matrix(runif(3*4), ncol=3))

# subsetting
MATRIX_1[!rowSums(MATRIX_2),]

Upvotes: 2

Related Questions