Luke
Luke

Reputation: 5708

Mark and remove duplicates in an array of objects using a specific property for comparison

I have an array of objects that looks like this:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01"
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02"
  },
  .....
]

I need to search through the array and mark duplicates based on the "field name" property. For this, I could use something like uniq { |i| i["field name"] }

However, for any duplicate items that are found, the item that ends up not being deleted needs to have a property added to it: multiple => true. I do not care which object ends up being the one that stays in the array, so long as it is marked with this property. So, running the function on the example above might produce:

[
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
    "multiple" => true
  },

  .....
]

Besides the removal of duplicates, I also need to be sure that the array's order is not affected by the function.

What is the best way to go about this?

Upvotes: 1

Views: 88

Answers (5)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Provided you are using Ruby v1.9+ (where hashes are guaranteed to maintain key-insertion order) you can use the form of Hash#update (aka merge!)that employs a block to determine the values of keys that are present in both hashes being merged. a is the array given by @sawa.

a.each_with_object({}) do |f,g|
  g.update(f["field name"]=>f) { |_,h| h.merge("multiple"=>true) }
end.values
  #=> [{"field name"=>"Account number", "data type"=>"number",
  #     "mneumonic"=>"ACTNUM", "field number"=>"027"},
  #    {"field name"=>"Warning", "data type"=>"code", "mneumonic"=>"WARN1",
  #     "field number"=>"034:01", "multiple"=>true}] 

Upvotes: 0

Matt Brictson
Matt Brictson

Reputation: 11082

This solution builds a new array with duplicates excluded. For each item in the original array, it checks whether there is an existing item that was already seen with the same name. If so, it marks that existing item as existing["multiple"] = true and skips that iteration.

This has the desired effect of omitting duplicates in the new array and marking the originals.

unique_data = data.each_with_object([]) do |item, result|
  if (existing = result.find { |i| i["field name"] == item["field name"] })
    existing["multiple"] = true
    next
  end
  result << item
end

Upvotes: 0

sawa
sawa

Reputation: 168091

Using this array:

a = [
  {
    "field name" => "Account number",
    "data type" => "number",
    "mneumonic" => "ACTNUM",
    "field number" => "027",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN1",
    "field number" => "034:01",
  },
  {
    "field name" => "Warning",
    "data type" => "code",
    "mneumonic" => "WARN2",
    "field number" => "034:02",
  },
]

This code:

file_names = {}
a.select do
  |h| k = h["field name"]
  if file_names[k]
    file_names[k]["multiple"] = true
    false
  else
    file_names[k] = h
    true
  end
end

will give:

[
  {
    "field name"   => "Account number",
    "data type"    => "number",
    "mneumonic"    => "ACTNUM",
    "field number" => "027"
  },
  {
    "field name"   => "Warning",
    "data type"    => "code",
    "mneumonic"    => "WARN1",
    "field number" => "034:01",
    "multiple"     => true
  }
]

Upvotes: 1

Piotr Kruczek
Piotr Kruczek

Reputation: 2390

Here's a pretty straightforward solution:

array # => your array of objects
used_names = []
multiple_names = []
array.each do |hash|
  name = hash['field name']
  if used_names.include? name
    multiple_names << name
    array.delete hash
  else
    used_names << name
  end
end
array.each do |hash|
  if multiple_names.include? hash['field name']
    hash['multiple'] = true
  end
end

Upvotes: 1

vikram7
vikram7

Reputation: 495

This version just counts the number of times "field name" occurs and if it's greater than 1 or not, it updates the hash as necessary.

field_name_counts = Hash.new 0

array.each do |hash|
  field_name = hash["field name"]
  field_name_counts[field_name] += 1
end

array.each do |hash|
  field_name = hash["field name"]
  if field_name_counts[field_name] > 1
    hash["multiple"] = true
  else
    hash["multiple"] = false
  end
end

Upvotes: 0

Related Questions