lostcoder
lostcoder

Reputation: 5

Retrieving Unique in Ruby

I'm currently using Ruby Version 1.8.7 and I've been searching around but unable to find a solution for this. I am currently working on creating a unique vendor identifier. But I'll simplify the issue here.

I have a 2D Array for vendors and products:

A = [["C"], ["A","D"], ["A","B"], ["B","C","E","F"], ["B","G","K"], [], ["H","I"], [], [], ["I"], ["J"]]

What I need to do here is to retrieve the top 5 users (Array index number) with the most number of unique product. In this case.

The top 5 vendors would be:

1 - A,D
3 - B,C,E,F
4 - B,G,K
6 - H,I
10 - J

Example:

Vendor 3 has Products ["B","C","E","F"] but Vendor 4 has products ["B","G","K"].
Since vendor 4 and 3 both has ["B"]
Vendor 3 has 3 unique products ["C","E","F"]
Vendor 4 has 2 unique products ["G","K"]

What I need to return is an Array of Vendors (Based on their index in the 2D Array) of the Top 5 Vendors.

Here are my codes so far:

def test

  vendors = [[C], [A,D], [A,B], [B,C,E,F], [B,G,K], [], [H,I], [], [], [I], [J]]
  useridArr = Array(0..vendors.length-1)
  vendors = inplace_quicksort(vendors, 0, vendors.length-1,useridArr)
  getUnique(vendors,useridArr, vendors.length-1)
end

def partition_array(array, left, right, pivot_index, arr)
  pivot_value = array[pivot_index].length
  arr[pivot_index], arr[right] = arr[right], arr[pivot_index]
  array[pivot_index], array[right] = array[right], array[pivot_index]
  store_index = left

  (left..right-1).each do |i|
    if array[i].length < pivot_value
      arr[i], arr[store_index] = arr[store_index], arr[i]
      array[i], array[store_index] = array[store_index], array[i]
      store_index = store_index + 1
    end
  end

  arr[store_index], arr[right] = arr[right], arr[store_index]
  array[store_index], array[right] = array[right], array[store_index]
  return store_index
end

def inplace_quicksort(array, left, right, indexArr)
  if left < right
    pivot_index = (left + ((right - left) / 2)).to_i
    new_pivot_index = partition_array(array, left, right, pivot_index,indexArr)
    inplace_quicksort(array, left, new_pivot_index - 1,indexArr)
    inplace_quicksort(array, new_pivot_index + 1, right,indexArr)
  end
  return array
end

def getUnique(vendors,useridArr, searchFor)
  while searchFor != -1
    p vendors.map {|a| a & vendors[searchFor] }
    searchFor = searchFor - 1 
  end
end

Upvotes: 0

Views: 527

Answers (1)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Each of the 11 elements of A corresponds to a vendor and there are (coincidentally) 11 products1:

A = [[:C], [:A, :D], [:A, :B], [:B, :C, :E, :F], [:B, :G, :K], [],
     [:H, :I], [], [], [:I], [:J]] 

products = A.flatten
  #=> [:C, :A, :D, :A, :B, :B, :C, :E, :F, :B, :G, :K, :H, :I, :I, :J]
products.uniq
  #=> [:C, :A, :D, :B, :E, :F, :G, :K, :H, :I, :J]
products.uniq.size
  #=> 11

We begin by computing the number of instances of each product:

g = Hash.new(0)
counts = products.each_with_object(g) { |p,h| h[p] += 1 }
  #=> {:C=>2, :A=>2, :D=>1, :B=>3, :E=>1, :F=>1, :G=>1, :K=>1,
  #    :H=>1, :I=>2, :J=>1}

g = Hash.new(0) creates an empty hash with a default value of zero. That means that if g does not have a key k, g[k] will return zero. Notice the expression h[p] += 1. That is called an abbreviated assignment. It simply means the expression is expanded to:

h[p] = h[p] + 1

before it is evaluated. If h does not have a key p, h[p] on the right side returns zero, so h[p] is set equal to 0+1 #=> 1.

Everything above would normally be written more compactly as follows:

counts = A.flatten.each_with_object(Hash.new(0)) { |p,h| h[p] += 1 }

The products offered by only one vendor are given by:

unique_products = counts.select { |_,count| count == 1 }.keys
  #=> [:D, :E, :F, :G, :K, :H, :J]

The vendor at offset 3 in A has two unique products, :E and :F:

[:B,:C,:E,:F] & unique_products
   #=> [:E, :F]

That is:

([:B,:C,:E,:F] & unique_products).size
   #=> 2

If we want the five vendors with the greatest numbers of unique products, ordered by decreasing numbers of unique products, we can do this:

A.sort_by { |a| -(a & unique_products).size }.first(5)
  #=> [[:B, :G, :K], [:B, :C, :E, :F], [:H, :I], [:A, :D], [:J]]  

In Ruby 2.2+ we can do this more directly, using Enumerable#max_by:

A.max_by(5) { |a| (a & unique_products).size }
  #=> [[:B, :G, :K], [:B, :C, :E, :F], [:J], [:A, :D], [:H, :I]] 

The ordering is slightly different, but that's because the last three vendors of the top five all have one unique product.

Wrapping this up, we might write a method as follows:

def max_unique_products(products_by_vendor, n)
  counts = products_by_vendor.flatten.
    each_with_object(Hash.new(0)) { |p,h| h[p] += 1 }
  unique_products = counts.select { |_,count| count == 1 }.keys
  products_by_vendor.max_by(n) { |a| (a & unique_products).size }
end

max_unique_products(A, 5)
  #=> [[:B, :G, :K], [:B, :C, :E, :F], [:J], [:A, :D], [:H, :I]] 

Edit 1: I forgot you wanted the indices of the top vendors. Just change the last line of the method above to:

products_by_vendor.each_with_index.
  max_by(n) { |a,_| (a & unique_products).size }.map(&:last)

or:

products_by_vendor.each_with_index.
  sort_by { |a,_| -(a & unique_products).size }.first(5).map(&:last)

and you will get:

max_unique_products(A, 5)
  #=> [4, 3, 10, 1, 6] 

Edit 2: To make this work with Ruby v1.8.7, try this:

def max_unique_products(products_by_vendor, n)
  counts = products_by_vendor.flatten.
    reduce(Hash.new(0)) { |h,p| h[p] += 1; h }
  unique_products = counts.select { |_,count| count == 1 }.map(&:first)
  products_by_vendor.each_with_index.
    sort_by { |a,_| -(a & unique_products).size }.first(5).map(&:last)
end

It works with v2.2 and I believe that all the methods existed in v1.8.7.

1. The OP originally defined A as [[C], [A, D]...]. I changed that to [[:C], [:A, :D]...] in my answer. lostcoder then changed it to [["C"], ["A", "D"]...].

Upvotes: 2

Related Questions