Jeni
Jeni

Reputation: 958

Conditional dataframe slicing

I would like to remove the rows of this dataframe in which, if the pattern ,2) exists, it just exist in one of the columns.

As an example: in this dataframe, each column is a character class (representing a vector in each position):

A   c(0,1)  c(1,1)
B   c(0,2)  c(0,1)
C   c(1,1)  c(0,1)
D   c(1,2)  c(0,2)

I would like to subset it, removing row B, as the pattern is present in one of the columns but not in the other.

I tried to use grep, but I don't know how to specify the conditional statement.

How can I achieve this?

Upvotes: 0

Views: 43

Answers (2)

KM_83
KM_83

Reputation: 727

Not as elegant as the selected answer above, but you can also split into two variables at the blank space and then create separate indices.

library(dplyr)

df = data.frame(v1=c('c(0,1)  c(1,1)','c(0,2)  c(0,1)',
                'c(1,1)  c(0,1)','c(1,2)  c(0,2)'))

empty_omit <- function(vec) vec[vec!='']
get_even <- function(vec) vec[seq_along(vec) %% 2 == 0]
get_odd <- function(vec) vec[seq_along(vec) %% 2 ==1]

df$v2 = strsplit(df$v1, ' ') %>% unlist() %>% empty_omit %>% get_odd()
df$v3 = strsplit(df$v1, ' ') %>% unlist() %>% empty_omit %>% get_even()

idx_v2 = grepl(",2)", df$v2)
idx_v3 = grepl(",2)", df$v3)

df[!idx_v2 | idx_v3, ]

Upvotes: 0

Gregor Thomas
Gregor Thomas

Reputation: 145775

For a single column we would do this (calling your data d)

d[!grepl(",2)", d$column_name, fixed = TRUE), ]

But we need to check all the columns and find rows that have exactly one match. For this, we'll convert to matrix and use rowSums to count the matches by row:

n_occurrences = rowSums(matrix(grepl(",2)", as.matrix(d), fixed = TRUE), nrow = nrow(d)))
d[n_occurrences != 1, ]
#   V1     V2     V3
# 1  A c(0,1) c(1,1)
# 3  C c(1,1) c(0,1)
# 4  D c(1,2) c(0,2)

Using this sample data:

d = read.table(text = 'A   c(0,1)  c(1,1)
B   c(0,2)  c(0,1)
C   c(1,1)  c(0,1)
D   c(1,2)  c(0,2)')

Upvotes: 1

Related Questions