toy
toy

Reputation: 12141

How do I match a longer string to shorter word or string

I have a database of items with tags, such as:

If I match the string:

"Today I would like to eat pork with apple sauce, it would fill me up"

against the tags, I would get three results. However, I just want to get the most specific one, which in this case would be item1.

This is just an example and i'm not using a particular database. Just string and map in ruby. I came up with "fuzzy search". I'm not sure if this is correct. Can anybody suggest how to solve this particular problem?

Upvotes: 1

Views: 257

Answers (4)

Index
Index

Reputation: 2381

This will apply to general programming and not Ruby in particular.

I would tokenize both Strings, that is both the needle and the haystack and then loop trough them both while counting number of occurens. Then finally compare scores.

Some sudo code:

needle[] = array of tokens from keysentence
haystack[] array of tokens from search string
int score = 0

do {
  haystackToken = haystack's next token

  do {
    needleToken = needle's next token

    if (haystackToken equals needleToken)
      score += 1

   } while(needle has more token)

} while (haystack has more tokens)

Upvotes: 0

doesterr
doesterr

Reputation: 3965

Depending on your specific use case, you may not need a fuzzy search.

Maybe a very basic implementation like this is sufficient for you:

class Search
  attr_reader :items, :string

  def initialize(items, string)
    @items  = items
    @string = string.downcase
  end

  def best_match
    items.max_by { |item| rate(item) }
  end

  private

  def rate(item)
    tag_list(item).count { |tag| string.include?(tag) }
  end

  def tag_list(item)
    item[:tags].split(" ")
  end
end

 

items = [
  { id: :item1, tags: "pork with apple sauce" },
  { id: :item2, tags: "pork" },
  { id: :item3, tags: "apple sauce" }
]

string = "Today I would like to eat pork with apple sauce, it would fill me up"

Search.new(items, string).best_match
#=> {:id=>:item1, :tags=>"pork with apple sauce"}

Upvotes: 2

sawa
sawa

Reputation: 168091

The order or specifity among the items in your database is determined before you match them with a string. You do not make it clear in the question, but I suppose what you have in mind is the length. So, suppose you have the data as a hash:

h = {
  item1: "pork with apple sauce",
  item2: "pork",
  item3: "apple sauce",
}

Then, you can sort this by the length of the tag so that a longer one comes first in the list. At the same time, you can convert the tags into regexes so that you don't need to worry about variation in space. Then, you would have an array like this:

a =
h
.sort_by{|_, s| s.length}.reverse
.map{|k, s| [k, Regexp.new("\\b#{s.gsub(/\s+/, '\\s+')}\\b")]}
# =>
# [
#   [
#     :item1,
#     /\bpork\s+with\s+apple\s+sauce\b/
#   ],
#   [
#     :item3,
#     /\bapple\s+sauce\b/
#   ],
#   [
#     :item2,
#     /\bpork\b/
#   ]
# ]

Once you have this, you just need to find the first item in the list that matches with the string.

s = "Today I would like to eat pork with apple sauce, it would fill me up"

a.find{|_, r| s =~ r}[0]
# => :item1

Upvotes: 1

epidemian
epidemian

Reputation: 19219

Yes, you need to do a fuzzy match (aka approximate match). It is quite a well known problem, and implementing an approximate matching algorithm by hand is not an easy task (but I'm sure it's very fun! =D). There are lots of things that can affect how "similar" two strings, A and B, are, depending on what things you consider important, like how many times A appears in B, or how close the order and distance between the words in A appear in B, or if the "important" words in A appear in B, etc.

If you can get by with an existing library, there seems to be a couple of Ruby gems that can get the job done. For example, using this one called fuzzy-string-match, which uses the Jaro-Winkler distance ported from Lucene (a Java library... it also seems to have preserved the Java convention of camelCased method names ¬¬):

require 'fuzzystringmatch'

matcher = FuzzyStringMatch::JaroWinkler.create(:pure)

tags = ["pork with apple sauce", "pork", "apple sauce"]
input = "Today I would like to eat pork with apple sauce, it would fill me up"

# Select the tag by distance to the input string (distance == 1 means perfect 
# match)
best_tag = tags.max_by { |tag| matcher.getDistance(tag, input) }

p best_tag 

Will correctly select "pork with apple sauce".

There's also this other gem called amatch that has many other approximate matching algorithms.

Upvotes: 3

Related Questions