james joyce
james joyce

Reputation: 493

Check if dataframe column value is present in list in R

I have a Master of colors as a list below

master <- list("Beige" = c("light brown", "light golden", "skin"),
                      "off-white" = c("off white", "cream", "light cream", "dirty white"),
                      "Metallic" = c("steel","silver"),
                      "Multi-colored" = c("multi color", "mixed colors", "mix", "rainbow"),
                      "Purple" = c("lavender", "grape", "jam", "raisin", "plum", "magenta"),
                      "Red" = c("cranberry", "strawberry", "raspberry", "dark cherry", "cherry","rosered"),
                      "Turquoise" = c("aqua marine", "jade green"),
                      "Yellow" = c("fresh lime")
                     )

and this is the datframe column that i have

df$color <- c('multi color','purple','steel','metallic','off white','raisin','strawberry','magenta','skin','Beige','Jade Green','cream','multi-colored','offwhite','rosered',"light cream")

Now i want to check if value persent in column is same as list key or same as list values

ex:
1)if df column value is off white first it should look at list keys which are Beige,off-white,Metallic... if it is present than get the value
2)it should also look at all the values that those keys have like if one of keys value is light cream than it should be considered as off-white
3)no case sensitive matters like OffWhITe == offwhite or space matters like off white==offwhite

OUTPUT
This should be the expected output

df$output <- c("Multi-colored","Purple","Metallic","Metallic","off-white","Purple","Red","Purple","Beige","Beige","Turquoise","off-white","Multi-colored","off-white","Red","off-white")

EDIT
any value in this c("multi color", "mixed colors", "mix", "rainbow","multicolored","MultI-cOlored","multi-colored","MultiColORed","Multi-colored") should be considered as Multi-colored

Upvotes: 2

Views: 1267

Answers (1)

akrun
akrun

Reputation: 887118

May be we can do a string_dist_join after stacking the list into a single data.frame

library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master, value = 'color') %>%
      unnest(c(color)) %>% 
      type.convert(as.is = TRUE) %>% 
      stringdist_right_join(df %>%
             mutate(rn = row_number()), max_dist = 3) %>% 
      transmute(color = color.y, output = coalesce(name, color.y))
# A tibble: 19 x 2
#   color         output       
#   <chr>         <chr>        
# 1 multi color   Multi-colored
# 2 purple        purple       
# 3 steel         Metallic     
# 4 metallic      metallic     
# 5 off white     off-white    
# 6 raisin        Purple       
# 7 strawberry    Red          
# 8 strawberry    Red          
# 9 magenta       Purple       
#10 skin          Beige        
#11 skin          Multi-colored
#12 Beige         Beige        
#13 Jade Green    Turquoise    
#14 cream         off-white    
#15 cream         Purple       
#16 multi-colored Multi-colored
#17 offwhite      off-white    
#18 rosered       Red          
#19 light cream   off-white    

data

df <- structure(list(color = c("multi color", "purple", "steel", "metallic", 
"off white", "raisin", "strawberry", "magenta", "skin", "Beige", 
"Jade Green", "cream", "multi-colored", "offwhite", "rosered", 
"light cream")), class = "data.frame", row.names = c(NA, -16L
))

Upvotes: 1

Related Questions