Reputation: 589

Ruby idiom for sorting on two fields

I need the Ruby idiom for sorting on two fields. In Python if you sort a list of two-element tuples, it sorts based on the first element, and if two elements are equal then the sort is based on the second element.

One example is the following sorting code in Python (word sort from longest to shortest and consider second element to break ties) from http://www.pythonlearn.com/html-008/cfbook011.html

txt = 'but soft what light in yonder window breaks'
words = txt.split()
t = list()
for word in words:
   t.append((len(word), word))

t.sort(reverse=True)

res = list()
for length, word in t:
    res.append(word)

print res

What I came up in Ruby is the following code that uses structs

txt = 'but soft what light in yonder window breaks'
words = txt.split()
t = []

tuple = Struct.new(:len, :word)
for word in words
    tpl = tuple.new
    tpl.len = word.length
    tpl.word =  word
    t << tpl
end

t = t.sort {|a, b| a[:len] == b[:len] ?
    b[:word] <=> a[:word] : b[:len] <=> a[:len]
    }

res = []
for x in t
    res << x.word
 end

puts res

I would like to know if there are better ways (less code) to achieve this stable sort.

Upvotes: 2

Answers (3)

Cary Swoveland

Reputation: 110685

Ruby makes this easy, using Enumerable#sort_by will and Array#<=> for sorting.

def sort_on_two(arr, &proc)
  arr.map.sort_by { |e| [proc[e], e] }.reverse
end

txt = 'but soft what light in yonder window breaks'

sort_on_two(txt.split) { |e| e.size }
  #=> ["yonder", "window", "breaks", "light", "what", "soft", "but", "in"]

sort_on_two(txt.split) { |e| e.count('aeiou') }
  #=> ["yonder", "window", "breaks", "what", "soft", "light", "in", "but"]

sort_on_two(txt.split) { |e| [e.count('aeiou'), e.size] }
  #=> ["yonder", "window", "breaks", "light", "what", "soft", "but", "in"]

Note that in recent versions of Ruby, proc.call(e) can be written proc[e], proc.yield(e) or proc.(e).

Upvotes: 1

Jordan Running

Reputation: 106027

I think you're overthinking this.

txt = 'but soft what light in yonder window breaks'

lengths_words = txt.split.map {|word| [ word.size, word ] }
# => [ [ 3, "but" ], [ 4, "soft" ], [ 4, "what" ], [ 5, "light" ], ... ]

sorted = lengths_words.sort
# => [ [ 2, "in" ], [ 3, "but" ], [ 4, "soft" ], [ 4, "what" ], ... ]

If you really want to use Struct, you can:

tuple = Struct.new(:length, :word)

tuples = txt.split.map {|word| tuple.new(word.size, word) }
# => [ #<struct length=3, word="but">, #<struct length=4, word="soft">, ... ]

sorted = tuples.sort_by {|tuple| [ tuple.length, tuple.word ] }
# => [ #<struct length=2, word="in">, #<struct length=3, word="but">, ... ]

This is equivalent:

sorted = tuples.sort {|tuple, other| tuple.length == other.length ?
                                       tuple.word <=> other.word : tuple.length <=> other.length }

(Note that it's sort this time, not sort_by.)

...but since we're using a Struct we can make this nicer by defining our own comparison operator (<=>), which sort will invoke (the same works in any Ruby class):

tuple = Struct.new(:length, :word) do
  def <=>(other)
    [ length, word ] <=> [ other.length, other.word ]
  end
end

tuples = txt.split.map {|word| tuple.new(word.size, word) }
tuples.sort
# => [ #<struct length=2, word="in">, #<struct length=3, word="but">, ... ]

There are other options for more complex sorting. If you wanted to get longest words first, for example:

lengths_words = txt.split.map {|word| [ word.size, word ] }
sorted = lengths_words.sort_by {|length, word| [ -length, word ] }
# => [ [ 6, "breaks" ], [ 6, "window" ], [ 6, "yonder" ], [ 5, "light" ], ... ]

Or:

tuple = Struct.new(:length, :word) do
  def <=>(other)
    [ -length, word ] <=> [ -other.length, other.word ]
  end
end

txt.split.map {|word| tuple.new(word.size, word) }.sort
# => [ #<struct length=6, word="breaks">, #<struct length=6, word="window">, #<struct length=6, word="yonder">, ... ]

As you can see, I'm relying a lot on Ruby's built-in ability to sort arrays based on their contents, but you can also "roll your own" if you prefer, which might perform better with many, many items. Here's a comparison method that's equivalent to your t.sort {|a, b| a[:len] == b[:len] ? ... } code (plus a bonus to_s method):

tuple = Struct.new(:length, :word) do
  def <=>(other)
    return word <=> other.word if length == other.length
    length <=> other.length
  end

  def to_s
    "#{word} (#{length})"
  end
end

sorted = txt.split.map {|word| tuple.new(word.size, word) }.sort
puts sorted.join(", ")
# => in (2), but (3), soft (4), what (4), light (5), breaks (6), window (6), yonder (6)

Finally, a couple comments on your Ruby style:

You pretty much never see for in idiomatic Ruby code. each is the idiomatic way to do almost all iteration in Ruby, and "functional" methods like map, reduce and select are also common. Never for.
A great advantage of Struct is that you get accessor methods for each property, so you can do tuple.word instead of tuple[:word].
Methods with no arguments are called without parentheses: txt.split.map, not txt.split().map

Upvotes: 5

tomsoft

Reputation: 4577

UPDATE: my first answer was wrong (this time!), thanks to @mu is too short comment

Your code is ok to sort on two criteria, but if you just want to achieve the same result, the best is to do the following:

txt.split.sort_by{|a| [a.size,a] }.reverse
=> ["breaks", "window", "yonder", "light", "soft", "what", "but", "in"]

The first check will use the size operator, and if the result is zero, it will use the second one....

If you really want to keep your data structure, it's same principle:

t.sort_by{ |a| [a[:len],a[:word]] }.reverse

Upvotes: 0

Ruby idiom for sorting on two fields

Answers (3)

Related Questions