arg0naut91
arg0naut91

Reputation: 14774

Subset data within function based on value in any column

Let's say I want to write a function like:

Fn <- function(df, to_remove = NULL) {
  df <- df[!df %in% to_remove,]
}

The purpose is to remove all values in a row (not row numbers/indices/names) where one of the values is equal to value(s) specified in to_remove.

Any idea why this doesn't work without specifying a column?

Example:

df <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

  a b
1 a a
2 a b
3 a a

Expected output:

  a b
1 a a
3 a a

I'm looking for a base R or data.table solution.

Upvotes: 0

Views: 82

Answers (2)

Shree
Shree

Reputation: 11150

To remove rows, you need to provide row indices with negative sign or vector (typically of same length as nrow(df)) with TRUE and FALSE. Your code !df %in% to_remove does not do that. Try this -

Fn <- function(df, to_remove = NULL) {
  df[!apply(df, 1, function(x) any(x %in% to_remove)), ]
}

Fn(df, "b")
  a b
1 a a
3 a a

Fn(df, c("a", "b"))
[1] a b
<0 rows> (or 0-length row.names)

Fn(df, "d")
  a b
1 a a
2 a b
3 a a

Upvotes: 1

Roman
Roman

Reputation: 4999

Why not a simple loop?

rowrem <- function(x, val) {
    for(i in 1:nrow(x)){
        for(j in 1:ncol(x)){
            if(paste(x[i,j]) == val)(
                x <- x[-i,]                
            )
        }
    }
    print(x)
}
Result
> rowrem(df1, "b")
  a b
1 a a
3 a a

Explanation: What you want to do is check every single value of every single cell and refer that back to the row number. With base R your choices are a bit limited in that regard. A sensible (i.e., maintainable) solution would probably be something like above, but I'm sure someone will come up with a lapply or subsetting solution as well.

Data

df1 <- data.frame(a = c("a", "a", "a"), b = c("a", "b", "a"))

Upvotes: 1

Related Questions