Kyle Decot
Kyle Decot

Reputation: 20815

Detecting Possible Duplicates in Rails

I have a Rails 3 application that has a model w/ a Name, and a Geographic Location (lat/lng). How would I go about search for possible duplicates in my model. I want to create a cron job or something that checks to see if two objects have a similar name and that they are less than 0.5 miles away from each other. If this matches then we'll flag the objects or something.

I am using Ruby Geocoder and ThinkingSphinx in my application.

Upvotes: 1

Views: 116

Answers (1)

Max Williams
Max Williams

Reputation: 32933

Levenshtein is as good a way as any for judging the similarity of two text strings, ie the names.

What i would suggest is to (as well as, or instead of, the single "lat;long" string) store the latitude and longitude seperately. Then you can do an sql query to find other records that are within a certain distance, THEN run the levenshtein on their names. You want to try to run the lev as few times as possible as it's slow.

Then you could do something like this: let's say your model name is "Place":

class Place < ActiveRecord::Base

  def nearby_places
    range = 0.005; #adjust this to get the proximity you want
    #lat and long are fields to hold the latitude and longitude as floats
    Place.find(:all, :conditions => ["id <> ? and lat > ? and lat < ? and long > ? and long < ?", self.id, self.lat - range, self.lat + range, self.long - range, self.long + range])
  end

  def similars
    self.nearby_places.select do |place|
      #levenshtein logic here - return true if self.name and place.name are similar according to your criteria
    end
  end

end

I've set range to 0.005 but i've no idea what it should be for 1/2 a mile. Let's work it out: google says one degree of latitude is 69.13 miles, so i guess half a mile in degrees would be 1/(69.13 * 2) which gives 0.0072, so not a bad guess :)

Note that my search logic would return places that are anywhere within a square which is a mile per side, with our current place in the centre. This would potentially include more places than a circle with 1/2 mile radius with our current place in the centre, but it's probably fine as a quick way of getting some nearby places.

Upvotes: 1

Related Questions