Conditional dataframe slicing

Question

I would like to remove the rows of this dataframe in which, if the pattern ,2) exists, it just exist in one of the columns.

As an example: in this dataframe, each column is a character class (representing a vector in each position):

A   c(0,1)  c(1,1)
B   c(0,2)  c(0,1)
C   c(1,1)  c(0,1)
D   c(1,2)  c(0,2)

I would like to subset it, removing row B, as the pattern is present in one of the columns but not in the other.

I tried to use grep, but I don't know how to specify the conditional statement.

How can I achieve this?

Gregor Thomas · Accepted Answer

For a single column we would do this (calling your data d)

d[!grepl(",2)", d$column_name, fixed = TRUE), ]

But we need to check all the columns and find rows that have exactly one match. For this, we'll convert to matrix and use rowSums to count the matches by row:

n_occurrences = rowSums(matrix(grepl(",2)", as.matrix(d), fixed = TRUE), nrow = nrow(d)))
d[n_occurrences != 1, ]
#   V1     V2     V3
# 1  A c(0,1) c(1,1)
# 3  C c(1,1) c(0,1)
# 4  D c(1,2) c(0,2)

Using this sample data:

d = read.table(text = 'A   c(0,1)  c(1,1)
B   c(0,2)  c(0,1)
C   c(1,1)  c(0,1)
D   c(1,2)  c(0,2)')

Conditional dataframe slicing

Answers (2)

Related Questions