roody
roody

Reputation: 2663

Creating a logical variable that identifies whether a string variable occurs within another string

I have a quick question about grep that I can't seem to resolve. Let's say that I have a list of names: brand<-c(Brand1, Brand2, Brand3, Brand4). I'd like to identify whether or not any of these names occur within another string variable (var1), and then create a logical variable (T/F).

ID        var1                    var_filter
1         Text about Brand 1      TRUE
1         Text                    FALSE
1         Text about Brand 2      TRUE
1         Text about Brand 3      TRUE
1         Text                    FALSE
1         Text about Brand 1      TRUE

How would I go about doing this? My guess is grep, but I'm not sure how to do it when I have an entire list of possible strings instead of a single string.

Upvotes: 0

Views: 139

Answers (3)

IRTFM
IRTFM

Reputation: 263411

Brand1 <- "Brand 1";  Brand2 <- "Brand 2"; Brand3 <- "Brand 3"; Brand4 <- "Brand 3"
brand <- c(Brand1, Brand2, Brand3, Brand4)

dfrm$var_filter <- grepl( paste(brand, collapse="|"), dfrm$var1)

Upvotes: 1

Dason
Dason

Reputation: 61953

I use a combination of sapply, grepl, and any to accomplish the task. The idea is to use grepl to find which elements in the text contain any given brand. I use sapply to do these for each of the brands. Then we use apply with any to identify which values in the text contained any of the brands.

brands <- c("CatJuice", "robopuppy", "DasonCo")

text <- c("nononono", "That CatJuice is great", "blargcats", "I gave the robopuppy some CatJuice")

id <- sapply(brands, grepl, text, fixed = TRUE)
# if case sensitivity is an issue
#id <- sapply(tolower(brands), grepl, tolower(text), fixed = TRUE)
apply(id, 1, any)

This is case sensitive so if that is an issue you could easily use tolower to convert everything to lower case.

Upvotes: 1

Rcoster
Rcoster

Reputation: 3210

You can use | in patters. Like this:

dados <- read.table(text='ID var1
1 TextaboutBrand1
1 Text
1 TextaboutBrand2
1 TextaboutBrand3
1 Text
1 TextaboutBrand1', header=TRUE, sep=' ')

grep1 <- function(x, brand) { length(grep(paste0(brand,collapse='|'), x[2])) == 1 }

apply(dados,1,grep1,brand)

Or use mapply()...

Upvotes: 0

Related Questions