byCoder
byCoder

Reputation: 9184

Detecting similar strings in Ruby.

In my db, there are entries eg. Тормозной диск, Диски тормозные LPR etc. in art_groups_arr array. I would like to find all the entries similar to Тормозной диск, such as Диски тормозные LPR

This code:

art_groups_arr.each do |artgrarr|
  if n2.art_group.include?(artgrarr)
    non_original << n2
  end
end

does not find them, obviously. How can I find those similar strings?

Upvotes: 0

Views: 1538

Answers (1)

Boris Stitnicky
Boris Stitnicky

Reputation: 12578

You can perhaps use regex, for example:

art_groups_arr.each do |art_gr_arr|
  if n2.art_group.any? { |element|
    /ормозн/ =~ element and /диск/ =~ element
  } then non_original << n2 end
end

Alternatively, you can try out fuzz_ball gem that claims to implement Smith-Waterman algorithm.

require 'fuzz_ball'
THRESHOLD_SCORE = 0.75
MATCHER = FuzzBall::Searcher.new [ 'Тормозной диск LPR' ]

def complies?( str )
  matchdata = MATCHER.search str
  return false if matchdata.nil? or matchdata.empty?
  score = matchdata[0][:score]
  puts "score is #{score}"
  score > THRESHOLD_SCORE
end

art_groups_arr.each do |art_gr_arr|
  if n2.art_group.any? { |element| complies? element } then
    non_original << n2
  end
end

For 'Диски тормозные LPR' you get score 0.861, you have to tune the threshold.

Upvotes: 1

Related Questions