AnApprentice
AnApprentice

Reputation: 111090

How to remove duplicates in a hash in Ruby on Rails?

I have a hash like so:

[
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

What I would like to learn how to do is how to remove a record if it is duplicate. Meaning, see how there are several "[email protected]" how can I remove the duplicate records, meaning remove all the others that have an email of "[email protected]".... Making email the key not the other fields?

Upvotes: 13

Views: 17634

Answers (4)

Harish Shetty
Harish Shetty

Reputation: 64373

I know this is an old thread, but Rails has a method on 'Enumerable' called 'index_by' which can be handy in this case:

list = [
  {
    :lname => "Brown",
    :email => "[email protected]",
    :fname => "James"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  },
  {
    :lname => "Smith",
    :email => "[email protected]",
    :fname => "Brad"
  },
  {
    :lname => nil,
    :email => "[email protected]",
    :fname => nil
  }
]

Now you can get the unique rows as follows:

list.index_by {|r| r[:email]}.values

To merge the rows with the same email id.

list.group_by{|r| r[:email]}.map do |k, v|
  v.inject({}) { |r, h| r.merge(h){ |key, o, n| o || n } }
end

Custom but efficient method:

list.inject({}) do |r, h| 
  (r[h[:email]] ||= {}).merge!(h){ |key, old, new| old || new }
  r
end.values

Upvotes: 19

DigitalRoss
DigitalRoss

Reputation: 146281

Ok, this (delete duplicates) is what you asked for:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? m : m << e }

But I think this (merge values) is what you want:

a.sort_by { |e| e[:email] }.inject([]) { |m,e| m.last.nil? ? [e] : m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) : m << e }

Perhaps I'm stretching the one-liner idea a bit unreasonably, so with different formatting and a test case:

Aiko:so ross$ cat mergedups
require 'pp'

a = [{:fname=>"James", :lname=>"Brown", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>nil,     :lname=>nil,     :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"},
     {:fname=>"Brad",  :lname=>"Smith", :email=>"[email protected]"}]

pp(
  a.sort_by { |e| e[:email] }.inject([]) do |m,e|
    m.last.nil? ? [e] :
      m.last[:email] == e[:email] ? (m.last.merge!(e) { |k,o,n| o || n }; m) :
        m << e
  end
)
Aiko:so ross$ ruby mergedups
[{:email=>"[email protected]", :fname=>"Brad", :lname=>"Smith"},
 {:email=>"[email protected]", :fname=>"James", :lname=>"Brown"}]

Upvotes: 1

Andrew Marshall
Andrew Marshall

Reputation: 97024

If you're putting this directly into the database, just use validates_uniqueness_of :email in your model. See the documentation for this.

If you need to remove them from the actual hash before being used then do:

emails = []  # This is a temporary array, not your results. The results are still in my_array
my_array.delete_if do |item|
  if emails.include? item[:email]
    true
  else
    emails << item[:email]
    false
  end
end

UPDATE:

This will merge the contents of duplicate entries

merged_list = {}
my_array.each do |item|
  if merged_list.has_key? item[:email]
    merged_list[item.email].merge! item
  else
    merged_list[item.email] = item
  end
end
my_array = merged_list.collect { |k, v| v }

Upvotes: 6

dnch
dnch

Reputation: 9605

In Ruby 1.9.2, Array#uniq will accept a block paramater which it will use when comparing your objects:

arrays.uniq { |h| h[:email] }

Upvotes: 26

Related Questions