Imitation
Imitation

Reputation: 134

R, for loop with ifelse and grepl function does not give expected results

I'm trying to find matching string with my_list and data frame(df) and depending on TRUE/FALSE I need to populate new_name column in df with first sting in matching list (my_list[[i]][1]) in case TRUE , or "cat" column value in case no match.

My data frame is as follows:

name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)

My list:

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)

My for loop with ifelse and grepl is as follows:

for (j in 1:nrow(df)) {
      for (i in 1:length(my_list)) {
        df[j, "new_name"]<- ifelse( 
        grepl(paste(my_list[[i]], collapse="|"), tolower(df[j, "name"])),
          my_list[[i]][1], 
          df[j, "cat"])

Expected output is :

df["new_name"]<- c("leasure", "none", "none", "transportation", "communication")
df

name            cat       new_name
1 NETFLIX.COM           none        leasure
2      BlueTV           none           none
3         smv           none           none
4       trafi transportation transportation
5     alkatel  communication  communication

Currently with the for loop I wrote I obtain exact copy of "cat" column meaning that all cases are considered as nonmatching (FALSE) in ifelse function. I'm note sure what's wrong here... Any help would be appreciated!

Upvotes: 0

Views: 417

Answers (2)

user2554330
user2554330

Reputation: 44788

It doesn't make sense to use ifelse() in that context: it is for vectorized selection. But your code would work if you had the pattern matching right. Unfortunately, for j == 1 and i == 2 (when you expected a match), your pattern is

"leasure|MTV|NETFLIX.COM"

and you are trying to match it to tolower(df[j, "name"]), which is

"netflix.com"

You should map both strings to lowercase, or set ignore.case = TRUE in the grepl() call. For example,

name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)

for (j in 1:nrow(df)) {
  for (i in 1:length(my_list)) {
    df[j, "new_name"] <- 
      if( grepl(paste(my_list[[i]], collapse="|"), df[j, "name"],
            ignore.case = TRUE))
        my_list[[i]][1] 
      else df[j, "cat"]
  }
}
df
#>          name            cat       new_name
#> 1 NETFLIX.COM           none        leasure
#> 2      BlueTV           none           none
#> 3         smv           none           none
#> 4       trafi transportation transportation
#> 5     alkatel  communication  communication

Created on 2021-08-10 by the reprex package (v2.0.0)

Generally speaking using pattern matching to find if a string is in a list is tricky; be really careful that your strings in my_list never include any characters that grepl() treats as special in a regular expression. For your example you'll get the same result as the grepl() gives using the test

tolower(df[j, "name"]) %in% tolower(my_list[[i]])

but that's not true for all possible name values: the grepl() code will allow partial matches (e.g. df[i, "name"] equal to "netflix.com in a long string") and %in% won't.

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388862

Here is one way using stringr::str_replace_all -

travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
#Create a named list
my_list<- dplyr::lst(travel, leasure)


result <- stringr::str_replace_all(df$name, setNames(names(my_list), 
          sapply(my_list, paste0, collapse = '|')))

#If the result is same as original value keep the previous cat.
df$new_name <- ifelse(result == df$name, df$cat, result)
df

#         name            cat       new_name
#1 NETFLIX.COM           none        leasure
#2      BlueTV           none           none
#3         smv           none           none
#4       trafi transportation transportation
#5     alkatel  communication  communication

Here the important part is this code -

setNames(names(my_list), sapply(my_list, paste0, collapse = '|'))

#travel|air_com|AIRCAT|tivago      leasure|MTV|NETFLIX.COM 
#                    "travel"                    "leasure" 

This means that whenever the pattern travel|air_com|AIRCAT|tivago is encountered in the string it will return "travel" as output and same for "leasure".

Upvotes: 0

Related Questions