Reputation: 493
I have a Master of colors as a list below
master <- list("Beige" = c("light brown", "light golden", "skin"),
"off-white" = c("off white", "cream", "light cream", "dirty white"),
"Metallic" = c("steel","silver"),
"Multi-colored" = c("multi color", "mixed colors", "mix", "rainbow"),
"Purple" = c("lavender", "grape", "jam", "raisin", "plum", "magenta"),
"Red" = c("cranberry", "strawberry", "raspberry", "dark cherry", "cherry","rosered"),
"Turquoise" = c("aqua marine", "jade green"),
"Yellow" = c("fresh lime")
)
and this is the datframe column that i have
df$color <- c('multi color','purple','steel','metallic','off white','raisin','strawberry','magenta','skin','Beige','Jade Green','cream','multi-colored','offwhite','rosered',"light cream")
Now i want to check if value persent in column
is same as list key
or same as list values
ex:
1)if df column value is off white
first it should look at list keys which are Beige,off-white,Metallic...
if it is present than get the value
2)it should also look at all the values that those keys have like if one of keys value is light cream
than it should be considered as off-white
3)no case sensitive matters like OffWhITe == offwhite
or space matters like off white==offwhite
OUTPUT
This should be the expected output
df$output <- c("Multi-colored","Purple","Metallic","Metallic","off-white","Purple","Red","Purple","Beige","Beige","Turquoise","off-white","Multi-colored","off-white","Red","off-white")
EDIT
any value in this c("multi color", "mixed colors", "mix", "rainbow","multicolored","MultI-cOlored","multi-colored","MultiColORed","Multi-colored")
should be considered as Multi-colored
Upvotes: 2
Views: 1267
Reputation: 887118
May be we can do a string_dist_join
after stack
ing the list
into a single data.frame
library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master, value = 'color') %>%
unnest(c(color)) %>%
type.convert(as.is = TRUE) %>%
stringdist_right_join(df %>%
mutate(rn = row_number()), max_dist = 3) %>%
transmute(color = color.y, output = coalesce(name, color.y))
# A tibble: 19 x 2
# color output
# <chr> <chr>
# 1 multi color Multi-colored
# 2 purple purple
# 3 steel Metallic
# 4 metallic metallic
# 5 off white off-white
# 6 raisin Purple
# 7 strawberry Red
# 8 strawberry Red
# 9 magenta Purple
#10 skin Beige
#11 skin Multi-colored
#12 Beige Beige
#13 Jade Green Turquoise
#14 cream off-white
#15 cream Purple
#16 multi-colored Multi-colored
#17 offwhite off-white
#18 rosered Red
#19 light cream off-white
df <- structure(list(color = c("multi color", "purple", "steel", "metallic",
"off white", "raisin", "strawberry", "magenta", "skin", "Beige",
"Jade Green", "cream", "multi-colored", "offwhite", "rosered",
"light cream")), class = "data.frame", row.names = c(NA, -16L
))
Upvotes: 1