Subset data within function based on value in any column

Question

Let's say I want to write a function like:

Fn <- function(df, to_remove = NULL) {
  df <- df[!df %in% to_remove,]
}

The purpose is to remove all values in a row (not row numbers/indices/names) where one of the values is equal to value(s) specified in to_remove.

Any idea why this doesn't work without specifying a column?

Example:

df <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

  a b
1 a a
2 a b
3 a a

Expected output:

  a b
1 a a
3 a a

I'm looking for a base R or data.table solution.

Shree · Accepted Answer

To remove rows, you need to provide row indices with negative sign or vector (typically of same length as nrow(df)) with TRUE and FALSE. Your code !df %in% to_remove does not do that. Try this -

Fn <- function(df, to_remove = NULL) {
  df[!apply(df, 1, function(x) any(x %in% to_remove)), ]
}

Fn(df, "b")
  a b
1 a a
3 a a

Fn(df, c("a", "b"))
[1] a b
<0 rows> (or 0-length row.names)

Fn(df, "d")
  a b
1 a a
2 a b
3 a a

Subset data within function based on value in any column

Answers (2)

Data

Related Questions