Maxim Fedotov
Maxim Fedotov

Reputation: 1357

Efficient way to use max_by with multiple conditions

I've been trying to figure out how to use max_by and max methods, with multiple conditions. While I thought I did, after extensive testing it turned out not to be the case.

For example, I have a hash whose values are hashes:

chapters = { "686050": {
        "volume": '1',
        "chapter": '1',
        "title": 'Chapter Title',
        "lang_code": 'gb',
        "timestamp": 1_565_434_815
      },
      "686049": {
        "volume": '2',
        "chapter": '1',
        "title": 'Chapter Title',
        "lang_code": 'gb',
        "timestamp": 1_565_300_815
      },
      "686048": {
        "volume": '2',
        "chapter": '2',
        "title": 'Chapter Title',
        "lang_code": 'gb',
        "timestamp": 1_565_300_815
      }
    }

I want to find one hash, that has the maximum volume AND chapter. So in this case, I am expecting to get the one with volume 2 and chapter 2.

My first implementation was using max:

chapters.max do |a, b|
  a[1][:volume].to_d <=> b[1][:volume].to_d &&
    a[1][:chapter].to_d <=> b[1][:chapter].to_d
end

When looking at this, my thinking was that it would first compare by volume and then by chapter. This code have worked... until I changed the hash to be:

  "686050": {
    "volume": '1',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_434_815
  },
  "686049": {
    "volume": '2',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_300_815
  }

Which suddenly started to return the one with volume 1 and chapter 2! I am still unsure why this was happening...

So I decided to try max_by, using this chapters.max_by { |_, v| v[:volume].to_d && v[:chapter].to_d }, with the same assumption. Which also ended up failing for when there is a single chapter per volume.

My solution ended up being:

chapters
  .group_by { |_, v| v[:volume].to_d }.max.last.to_h
  .max_by { |_, v| v[:chapter].to_d }

Which works for obvious reasons, but feels quite inefficient. There must be a way to use a single max or max_by. Would love to hear some ideas on this

Upvotes: 0

Views: 1924

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110725

You may do the following.

def max_val(h,key)
  h.map { |_,g| g[key] }.max
end

max_volume = max_val(h, :volume)    #=> "2"
max_chapter = max_val(h, :chapter)  #=> "2"

chapters.find { |_,g| g[:volume] == max_volume && g[:chapter] == max_chapter } 
  #=> [:"686048", {:volume=>"2", :chapter=>"2", :title=>"Chapter Title",
  #                :lang_code=>"gb", :timestamp=>1565300815}] 

nil is returned if no value contains the maximum values of both keys. Replace find with select if all key-value pairs having the desired property are to be returned (i.e., if there are ties). That array would be empty if no value contains the maximum values of both keys.

This requires three passes through the hash. It could be done in a single pass but it's messy and I expect it would be more time-consuming, largely because Enumerable#map and Array#max are both implemented in C.

The desired result could be obtained in a single pass as follows.

max_volume  = 0.chr
max_chapter = 0.chr
candidate = [nil, {}]
chapters.each do |k,g|
  old_max_volume = max_volume
  max_volume = [max_volume, g[:volume]].max
  old_max_chapter = max_chapter 
  max_chapter = [max_chapter, g[:chapter]].max
  if max_volume > old_max_volume || max_chapter > old_max_chapter
    candidate = (max_volume == g[:volume] && max_chapter == g[:chapter]) ? 
     [k,g] : [nil, {}]
  end
end
candidate
  #=> [:"686048", {:volume=>"2", :chapter=>"2", :title=>"Chapter Title",
  #    :lang_code=>"gb", :timestamp=>1565300815}] 

Now set:

chapters[:"686048"][:volume] = '1'

The first method above now returns nil, the second (one-pass), returns [nil, {}]

Upvotes: 1

cavin kwon
cavin kwon

Reputation: 501

Enumerable#max_by

# example 1
a = %w[albatross dog horse]
a.max_by(2) {|x| x.length } #=> ["albatross", "horse"]

# example 2 - Arrays are compared in an “element-wise” manner
arrays = [['1', '1'], ['2', '1'], ['2', '2']]
arrays.max_by(2) { |arr| arr } #=> ["2","2"], ["2", "1"]

So

chapters.max_by { |_, elem| [elem[:volume], elem[:chapter]] }

Let's test

book1 = { 
  "686050": {
    "volume": '1',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_434_815
  },
  "686049": {
    "volume": '2',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_300_815
  },
  "686048": {
    "volume": '2',
    "chapter": '2',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_300_815
  }
}

book2 = {
  "686050": {
    "volume": '1',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_434_815
  },
  "686049": {
    "volume": '2',
    "chapter": '1',
    "title": 'Chapter Title',
    "lang_code": 'gb',
    "timestamp": 1_565_300_815
  }  
}

pp book1.max_by(2) { |_, elem| [elem[:volume], elem[:chapter]] }
pp book2.max_by { |_, elem| [elem[:volume], elem[:chapter]] }

outputs

# 1st test
[[:"686048",
  {:volume=>"2",
   :chapter=>"2",
   :title=>"Chapter Title",
   :lang_code=>"gb",
   :timestamp=>1565300815}],
 [:"686049",
  {:volume=>"2",
   :chapter=>"1",
   :title=>"Chapter Title",
   :lang_code=>"gb",
   :timestamp=>1565300815}]]

# 2nd test
[:"686049",
 {:volume=>"2",
  :chapter=>"1",
  :title=>"Chapter Title",
  :lang_code=>"gb",
  :timestamp=>1565300815}]

Upvotes: 0

Related Questions