match products in a list in R

Question

I have to classify a list of products like these:

product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg) cow','chicken breast','noodles','salad','chicken salad with egg'))

Based on the words included in each element of this vector:

product_to_match<-c('cow meat','deer meat','cow milk','chicken breast','chicken egg salad','anana')

I would have to match all the words of each product product_to_match, into each element of the dataframe.

I am not sure what is the best way to do this, in order to classify each product into a new column, in order to have something like this:

product_list<-data.frame(product=c('banana from ecuador 1 unit', 'argentinian meat (1 kg) 
cow','chicken breast','noodles','salad','chicken salad with egg'),class=c(NA,'cow meat','chicken 
breast',NA,NA,'chicken egg salad'))

Notice that 'anana' did not match with 'banana', eventhough the characers are included in the string but not the word. I am not sure how to do this.

Thank you.

ThomasIsCoding · Accepted Answer

Perhaps this could help

q <- outer(
  strsplit(product_to_match, "\s+"),
  strsplit(product_list$product, "\s+"),
  FUN = Vectorize(function(x, y) all(x %in% y))
)
product_list$class <- product_to_match[replace(colSums(q * row(q)), colSums(q) == 0, NA)]

such that

> product_list
                      product             class
1  banana from ecuador 1 unit              
2 argentinian meat (1 kg) cow          cow meat
3              chicken breast    chicken breast
4                     noodles              
5                       salad              
6      chicken salad with egg chicken egg salad

match products in a list in R

Answers (2)

Related Questions