Reputation: 9184
In my db, there are entries eg. Тормозной диск
, Диски тормозные LPR
etc. in art_groups_arr
array. I would like to find all the entries similar to Тормозной диск
, such as Диски тормозные LPR
This code:
art_groups_arr.each do |artgrarr|
if n2.art_group.include?(artgrarr)
non_original << n2
end
end
does not find them, obviously. How can I find those similar strings?
Upvotes: 0
Views: 1538
Reputation: 12578
You can perhaps use regex, for example:
art_groups_arr.each do |art_gr_arr|
if n2.art_group.any? { |element|
/ормозн/ =~ element and /диск/ =~ element
} then non_original << n2 end
end
Alternatively, you can try out fuzz_ball gem that claims to implement Smith-Waterman algorithm.
require 'fuzz_ball'
THRESHOLD_SCORE = 0.75
MATCHER = FuzzBall::Searcher.new [ 'Тормозной диск LPR' ]
def complies?( str )
matchdata = MATCHER.search str
return false if matchdata.nil? or matchdata.empty?
score = matchdata[0][:score]
puts "score is #{score}"
score > THRESHOLD_SCORE
end
art_groups_arr.each do |art_gr_arr|
if n2.art_group.any? { |element| complies? element } then
non_original << n2
end
end
For 'Диски тормозные LPR'
you get score 0.861
, you have to tune the threshold.
Upvotes: 1