Reputation: 31
I have a csv file with the column named Phrases, like below. And would like to assign an object to the specific phrase based on the object description from a data frame.
[,1]
[1,] Phrases
[2,] sugar fluid
[3,] they are crispy
[4,] its soft
I've have a data frame with following
[,1] [,2]
[1,] Description, Object
[2,] sweet and delicious, apple
[3,] hard, nuts
[4,] wet and fluid, water
[5,] sugar fluid, coke
[6,] soft, marshmallow
[7,] crispy salty, chips
The output should look like this
[,1] [,2]
[1,] Phrases, Object Assigned
[2,] sugar fluid, coke
[3,] they are crispy, chips
[4,] its soft, marshmallow
Notice how it may not be the exact phrase to description. As long as the object with the most matched words from its description is assigned to the phrase.
How do I do this?
Upvotes: 0
Views: 67
Reputation: 12084
Here's a rough solution. First, I create the data frames. (For future reference: it helps a great deal if you provide the data in a copy-and-pastable format, such as using dput
.)
# Create data frames
df_object <- structure(list(Description = c("sweet and delicious", "hard",
"wet and fluid", "sugar fluid", "soft", "crispy salty"),
Object = c("apple", "nuts", "water", "coke", "marshmallow", "chips")),
row.names = c(NA, -6L), class = c("data.frame"),
.Names = c("Description", "Object"))
df_phrases <- structure(list(Phrases = c("sugar fluid", "they are crispy", "its soft")),
row.names = c(NA, -3L), class = c("data.frame"),
.Names = "Phrases")
A quick peak at the data frames to make sure they're correct
# Examine data frames
df_object
#> Description Object
#> 1 sweet and delicious apple
#> 2 hard nuts
#> 3 wet and fluid water
#> 4 sugar fluid coke
#> 5 soft marshmallow
#> 6 crispy salty chips
df_phrases
#> Phrases
#> 1 sugar fluid
#> 2 they are crispy
#> 3 its soft
Next, is the meat of the solution.
qd
that takes a phrase and compares it to the Description
in df_object
to find the most similar. adist
makes the comparison and provides a quantitative metric of the similarity. which.min
finds the smallest value returned by adist
(i.e., the most similar).which.min
is used to look up the corresponding Object
.# Quick & dirty function
qd<- function(phrase){
with(df_object, Object[which.min(adist(phrase, Description, partial = TRUE))])
}
I then apply this to all Phrases
and store the result in Obj_Assigned
# Apply 'qd' to 'Phrases' and store as 'Obj_Assigned'
df_phrases$Obj_Assigned <- sapply(df_phrases$Phrases, qd)
# Examine results
df_phrases
#> Phrases Obj_Assigned
#> 1 sugar fluid coke
#> 2 they are crispy chips
#> 3 its soft marshmallow
Created on 2019-12-03 by the reprex package (v0.2.1.9000)
The result is as requested. To call this approach flimsy is being generous. It's easy to break and not especially reliable, but works for your toy example.
Upvotes: 1