Caio Tarifa
Caio Tarifa

Reputation: 6043

Measure the distance between two strings with Ruby?

Can I measure the distance between two strings with Ruby?

I.e.:

compare('Test', 'est') # Returns 1
compare('Test', 'Tes') # Returns 1
compare('Test', 'Tast') # Returns 1
compare('Test', 'Taste') # Returns 2
compare('Test', 'tazT') # Returns 5

Upvotes: 33

Views: 18587

Answers (7)

pfac
pfac

Reputation: 830

Ruby 2.3 and later ship with the did_you_mean gem which includes DidYouMean::Levenshtein.distance. Fit for most cases and available by default.

DidYouMean::Levenshtein.distance("Test", "est") # => 1

Upvotes: 28

Hula_Zell
Hula_Zell

Reputation: 1250

I like DigitalRoss' solution above. However, as pointed out by dawg, its runtime grows on the order O(3^n), which is no good for longer strings. That solution can be sped up significantly using memoization, or 'dynamic programming':

def lev(string1, string2, memo={})
  return memo[[string1, string2]] if memo[[string1, string2]]
  return string2.size if string1.empty?
  return string1.size if string2.empty?
  min = [ lev(string1.chop, string2, memo) + 1,
          lev(string1, string2.chop, memo) + 1,
          lev(string1.chop, string2.chop, memo) + (string1[-1] == string2[-1] ? 0 : 1)
       ].min
  memo[[string1, string2]] = min
  min
end

We then have much better runtime, (I think it's almost linear? I'm not really sure).

[9] pry(main)> require 'benchmark'
=> true
[10] pry(main)> @memo = {}
=> {}
[11] pry(main)> Benchmark.realtime{puts lev("Hello darkness my old friend", "I've come to talk with you again")}
26
=> 0.007071999832987785

Upvotes: 3

Nakilon
Nakilon

Reputation: 35102

There is an utility method in Rubygems that actually should be public but it's not, anyway:

require "rubygems/text"
ld = Class.new.extend(Gem::Text).method(:levenshtein_distance)

p ld.call("asd", "sdf") => 2

Upvotes: 28

dimus
dimus

Reputation: 9390

I made a damerau-levenshtein gem where algorithms are implemented in C

require "damerau-levenshtein"
dl = DamerauLevenshtein
dl.distance("Something", "Smoething") #returns 1

Upvotes: 5

DigitalRoss
DigitalRoss

Reputation: 146221

Much simpler, I'm a Ruby show-off at times...

# Levenshtein distance, translated from wikipedia pseudocode by ross

def lev s, t
  return t.size if s.empty?
  return s.size if t.empty?
  return [ (lev s.chop, t) + 1,
           (lev s, t.chop) + 1,
           (lev s.chop, t.chop) + (s[-1, 1] == t[-1, 1] ? 0 : 1)
       ].min
end

Upvotes: 17

Michael F
Michael F

Reputation: 1409

Much easier and fast due to native C binding:

gem install levenshtein-ffi
gem install levenshtein

require 'levenshtein'

Levenshtein.normalized_distance string1, string2, threshold

http://rubygems.org/gems/levenshtein http://rubydoc.info/gems/levenshtein/0.2.2/frames

Upvotes: 32

Hugo Demiglio
Hugo Demiglio

Reputation: 1589

I found this for you:

def levenshtein_distance(s, t)
  m = s.length
  n = t.length
  return m if n == 0
  return n if m == 0
  d = Array.new(m+1) {Array.new(n+1)}

  (0..m).each {|i| d[i][0] = i}
  (0..n).each {|j| d[0][j] = j}
  (1..n).each do |j|
    (1..m).each do |i|
      d[i][j] = if s[i-1] == t[j-1]  # adjust index into string
                  d[i-1][j-1]       # no operation required
                else
                  [ d[i-1][j]+1,    # deletion
                    d[i][j-1]+1,    # insertion
                    d[i-1][j-1]+1,  # substitution
                  ].min
                end
    end
  end
  d[m][n]
end

[ ['fire','water'], ['amazing','horse'], ["bamerindos", "giromba"] ].each do |s,t|
  puts "levenshtein_distance('#{s}', '#{t}') = #{levenshtein_distance(s, t)}"
end

That's awesome output: =)

levenshtein_distance('fire', 'water') = 4
levenshtein_distance('amazing', 'horse') = 7
levenshtein_distance('bamerindos', 'giromba') = 9

Source: http://rosettacode.org/wiki/Levenshtein_distance#Ruby

Upvotes: 29

Related Questions