Selim Alawwa
Selim Alawwa

Reputation: 762

Sort an array of arrays based on the order in another array

I have an array of arrays:

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
  ["delivered", 23], ["scheduled", 1], ["canceled", 51]
]

My sorting array is

order_array = [
  "ready", "in_progress", "recieved", "shipped", "scheduled", "pick_up",
 "delivered", "canceled", "failed", "refunded", "refund_failed"
]

I need to order x based on the value of the first element in each subarray. The required sorted array is:

[
  ["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23],
  ["canceled", 51], ["refunded", 1]
]

Using sort_by doesn't result in the required sorting, it leads to the same array.

result = x.sort_by {|u| order_array.index(u)}
# => [
#      ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1],
#      ["delivered", 23], ["scheduled", 1], ["canceled", 51]
# ]

Upvotes: 0

Views: 165

Answers (6)

Cary Swoveland
Cary Swoveland

Reputation: 110675

I've assumed:

  • the first element of each element of x is not necessarily unique;
  • all elements of x whose first element is the same and whose first element is a member of order_array appear consecutively in the returned (sorted) array in the order in which those elements appear in x;
  • any elements of x whose first element is not a member of order_array appears in the returned (sorted) array after all elements whose first element is in sorted_array, and all such elements appear in the returned array (at the end) in the order in which they occur in x; and
  • efficiency is paramount.

x = [
  ["ready", 5], ["shipped", 1], ["pending", 1], ["refunded", 1], ["originated", 3],
  ["delivered", 23], ["scheduled", 1], ["ready", 8], ["canceled", 51]
]

order_array = [
  "ready", "in_progress", "received", "shipped", "scheduled", "pick_up",
  "delivered", "canceled", "failed", "refunded", "refund_failed"
]

order_pos = order_array.each_with_object({}) { |word,h| h[word] = [] }
  #=> {"ready"=>[], "in_progress"=>[], "received"=>[], "shipped"=>[],
  #    "scheduled"=>[], "pick_up"=>[], "delivered"=>[], "canceled"=>[],
  #    "failed"=>[], "refunded"=>[], "refund_failed"=>[]} 
back = x.each_with_index.with_object([]) { |((word,v),i),back|
  order_pos.key?(word) ? (order_pos[word] << i) : back << [word,v] }
  #=> [["pending", 1], ["originated", 3]] 
order_pos.flat_map { |word,offsets| offsets.map { |i| x[i] } }.concat(back)
  #=> [["ready", 5], ["ready", 8], ["shipped", 1], ["scheduled", 1],
  #    ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1],
  #    ["originated", 3]] 

Note:

order_pos
  #=> {"ready"=>[0, 7], "in_progress"=>[], "received"=>[], "shipped"=>[1],
  #    "scheduled"=>[6], "pick_up"==>[], "delivered"=>[5], "canceled"=>[8],
  #    "failed"=>[], "refunded"=>[3], "refund_failed"=>[]} 

It is necessary to initialise order_pos in order for its keys to be ordered by order_arr. This is an example of the worth of a controversial change made in Ruby 1.9 which guaranteed that hash keys will remain in key-insertion order.

Upvotes: 0

iGian
iGian

Reputation: 11193

I'd suggest

x.keep_if { |e| order_array.include? e[0] }.sort_by { |e| order_array.index(e[0]) }

Since some values are not elements of order_array, for example "pending".

#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]


Benchmarked the answers up to now 500.times:

#        user       system     total       real
# sawa   0.006698   0.000132   0.006830 (  0.006996) # on the first method
# ray    0.005543   0.000123   0.005666 (  0.005770)
# igian  0.001923   0.000003   0.001926 (  0.001927)
# srack  0.005270   0.000168   0.005438 (  0.005540) # on the last method


Just for fun I tried to find a faster method for Ruby 2.5:

xx = x.to_h # less than Ruby 2.6
order_array.each.with_object([]) { |k, res| res << [k, xx[k]] if xx.has_key? k }

Upvotes: 2

SRack
SRack

Reputation: 12203

You're almost there with this: index isn't working as you're comparing the full array, rather than the first element of it. This will work:

result = x.sort_by { |u| order_array.index(u[0]) || 100 }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1], ["pending", 1]]

Please note, the 100 is there to default to the back of the sort if the value isn't found in order_array.


Edit

This was initially accepted, despite including ["pending", 1] suggesting it fit the requirements; however, here's a solution to avoid the unwanted entry, which also handles duplicates should the need arise.

order_array.each_with_object([]) { |ordered_by, array| array.push(*x.select { |item| item[0] == ordered_by }) }
#=> [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

Or, very fast though still allowing for duplicate values under each ordered item:

hash = x.each_with_object(Hash.new { |h,k| h[k] = [] }) { |item, h| h[item[0]] << item[1] }
order_array.flat_map { |key| [key, hash[key]] }

Benchmark

Here's a benchmark for this scenario with a larger dataset: https://repl.it/repls/SentimentalAdequateClick. Looks like Sawa's methods lead the way, though my last effort works handily should there be duplicate values in future. Also, my second effort sucks (which surprised me a little) :)

Upvotes: 4

steenslag
steenslag

Reputation: 80065

assoc seems helpful: "Searches through an array whose elements are also arrays comparing obj with the first element of each contained array using obj.==."

order_array.map{|e| x.assoc(e) }.compact

Upvotes: 4

ray
ray

Reputation: 5552

You can try below code to find output efficiently,

order_array.map { |p| x.detect { |y| y[0] == p } }.compact
# => [["ready", 5], ["shipped", 1], ["scheduled", 1], ["delivered", 23], ["canceled", 51], ["refunded", 1]]

Upvotes: 1

sawa
sawa

Reputation: 168101

h = x.to_h
# => {"ready"=>5,
# "shipped"=>1,
# "pending"=>1,
# "refunded"=>1,
# "delivered"=>23,
# "scheduled"=>1,
# "canceled"=>51}

order_array.map{|key| [key, h[key]] if h.key?(key)}.compact
# => [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

order_array.map{|k| h[k]}.compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

or

h = x.to_h{|k, v| [k, [k, v]]}
#=> {"ready"=>["ready", 5],
# "shipped"=>["shipped", 1],
# "pending"=>["pending", 1],
# "refunded"=>["refunded", 1],
# "delivered"=>["delivered", 23],
# "scheduled"=>["scheduled", 1],
# "canceled"=>["canceled", 51]}

h.values_at(*order_array).compact
#=> [["ready", 5],
# ["shipped", 1],
# ["scheduled", 1],
# ["delivered", 23],
# ["canceled", 51],
# ["refunded", 1]]

Upvotes: 5

Related Questions