Reputation: 134
I'm trying to find matching string with my_list and data frame(df) and depending on TRUE/FALSE I need to populate new_name column in df with first sting in matching list (my_list[[i]][1]) in case TRUE , or "cat" column value in case no match.
My data frame is as follows:
name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)
My list:
travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)
My for loop with ifelse and grepl is as follows:
for (j in 1:nrow(df)) {
for (i in 1:length(my_list)) {
df[j, "new_name"]<- ifelse(
grepl(paste(my_list[[i]], collapse="|"), tolower(df[j, "name"])),
my_list[[i]][1],
df[j, "cat"])
Expected output is :
df["new_name"]<- c("leasure", "none", "none", "transportation", "communication")
df
name cat new_name
1 NETFLIX.COM none leasure
2 BlueTV none none
3 smv none none
4 trafi transportation transportation
5 alkatel communication communication
Currently with the for loop I wrote I obtain exact copy of "cat" column meaning that all cases are considered as nonmatching (FALSE) in ifelse function. I'm note sure what's wrong here... Any help would be appreciated!
Upvotes: 0
Views: 417
Reputation: 44788
It doesn't make sense to use ifelse()
in that context: it is for vectorized selection. But your code would work if you had the pattern matching right. Unfortunately, for j == 1
and i == 2
(when you expected a match), your pattern is
"leasure|MTV|NETFLIX.COM"
and you are trying to match it to tolower(df[j, "name"])
, which is
"netflix.com"
You should map both strings to lowercase, or set ignore.case = TRUE
in the grepl()
call. For example,
name <- c("NETFLIX.COM", "BlueTV", "smv", "trafi", "alkatel")
cat<- c("none", "none", "none", "transportation", "communication")
df<-data.frame(name, cat)
travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
my_list<- list(travel, leasure)
for (j in 1:nrow(df)) {
for (i in 1:length(my_list)) {
df[j, "new_name"] <-
if( grepl(paste(my_list[[i]], collapse="|"), df[j, "name"],
ignore.case = TRUE))
my_list[[i]][1]
else df[j, "cat"]
}
}
df
#> name cat new_name
#> 1 NETFLIX.COM none leasure
#> 2 BlueTV none none
#> 3 smv none none
#> 4 trafi transportation transportation
#> 5 alkatel communication communication
Created on 2021-08-10 by the reprex package (v2.0.0)
Generally speaking using pattern matching to find if a string is in a list is tricky; be really careful that your strings in my_list
never include any characters that grepl()
treats as special in a regular expression. For your example you'll get the same result as the grepl()
gives using the test
tolower(df[j, "name"]) %in% tolower(my_list[[i]])
but that's not true for all possible name
values: the grepl()
code will allow partial matches (e.g. df[i, "name"]
equal to "netflix.com in a long string"
) and %in%
won't.
Upvotes: 2
Reputation: 388862
Here is one way using stringr::str_replace_all
-
travel<- c("travel","air_com", "AIRCAT", "tivago")
leasure<- c("leasure","MTV", "NETFLIX.COM")
#Create a named list
my_list<- dplyr::lst(travel, leasure)
result <- stringr::str_replace_all(df$name, setNames(names(my_list),
sapply(my_list, paste0, collapse = '|')))
#If the result is same as original value keep the previous cat.
df$new_name <- ifelse(result == df$name, df$cat, result)
df
# name cat new_name
#1 NETFLIX.COM none leasure
#2 BlueTV none none
#3 smv none none
#4 trafi transportation transportation
#5 alkatel communication communication
Here the important part is this code -
setNames(names(my_list), sapply(my_list, paste0, collapse = '|'))
#travel|air_com|AIRCAT|tivago leasure|MTV|NETFLIX.COM
# "travel" "leasure"
This means that whenever the pattern travel|air_com|AIRCAT|tivago
is encountered in the string it will return "travel"
as output and same for "leasure"
.
Upvotes: 0