robbie
robbie

Reputation: 647

R find value in multiple data frame columns

Given a data set where a value could be in any of a set of columns from the dataframe:

df <- data.frame(h1=c('a', 'b', 'c', 'a', 'a', 'b', 'c'), h2=c('b', 'c', 'd', 'b', 'c', 'd', 'b'), h3=c('c', 'd', 'e', 'e', 'e', 'd', 'c'))

How can I get a logical vector that specifies which rows contain the target value? In this case, searching for 'b', I'd want a logical vector with rows (1,2,4,6,7) as TRUE.

The real data set is much larger and more complicated so I'm trying to avoid a for loop.

thanks

EDIT:

This seems to work.

>apply(df, 1, function(x) {'b' %in% as.vector(t(x))}) -> i
> i
[1]  TRUE  TRUE FALSE  TRUE FALSE  TRUE  TRUE

Upvotes: 5

Views: 11271

Answers (3)

Holger Brandl
Holger Brandl

Reputation: 11192

I'd rather wrap it into a small helper function that also returns the matching rows and performs a case-insensitive search across all columns

require(dplyr)
require(stringr)

search_df = function(df, search_term){
    apply(df, 1, function(r){
        any(str_detect(as.character(r), fixed(search_term, ignore_case=T)))
    }) %>% subset(df, .)
}

search_df(iris, "Setosa")

To keep it more generic this can also be rewritten to expose the matching expression/rule as a function argument:

match_df = function(df, search_expr){
    filter_fun = eval(substitute(function(x){search_expr}))

    apply(df, 1, function(r) any(filter_fun(r))) %>% subset(df, .)
}

match_df(iris, str_detect(x, "setosa"))

Upvotes: 0

flodel
flodel

Reputation: 89057

If speed is a concern I would go with:

rowSums(df == "b") > 0

Upvotes: 10

Hong Ooi
Hong Ooi

Reputation: 57686

apply(df, 1, function(r) any(r == "b"))

Upvotes: 6

Related Questions