Ben Williams
Ben Williams

Reputation: 101

How to find the number of unique occurrences for an Array in Ruby

I have an array, containing n amount of elements. Each element contains two words.

This makes the array look like this: ['England John', 'England Ben', 'USA Paul', 'England John']

I want to find the number of unique names for each country. For example, England would have 2 unique names as John exists two times.

So far I have split the array into two arrays, one containing the countries such as ['England', 'Usa', ...] and the other containing names ['John', 'Paul', ...], however I'm unsure of where to go from here

Upvotes: 0

Views: 567

Answers (4)

Cary Swoveland
Cary Swoveland

Reputation: 110755

arr = ['England John', 'England Ben', 'USA Paul', 'England John']

arr.uniq.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 }
  #=> {"England"=>2, "USA"=>1}

This requires two passes through the array (arr.uniq being the first). To make only a single pass one could do the following.

require 'set'

uniques = Set.new
arr.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 if uniques.add?(s) }
  #=> {"England"=>2, "USA"=>1}

See the form of Hash::new that takes an argument (called the default value), and also Set#add?.

It's not clear to me which of the two calculations would generally be faster.

Upvotes: 1

Pascal
Pascal

Reputation: 8656

A bit more verbose than the other solutions but does not use transform_valuesfrom ActiveSupport.

require "set"

data = ["England John", "England Ben", "USA Paul", "England John", "Switzerland Pascal"]

names_per_country = data.each_with_object({}) do |country_and_name, accu|
  country, name = country_and_name.split(" ")
  country_data = accu[country] ||= Set.new
  country_data << name
end

names_per_country.each do |country, names|
  puts "#{country} has #{names.size} unique name(s)"
end

# => England has 2 unique names
# => USA has 1 unique names
# => Switzerland has 1 unique names

This solution first transforms the array to a Hash structure, where the key is the country name and the value is a Set. I've chosen Set because it does take care of the unique part of your question automatically (a Set can not contain duplicates).

After that you can find the number of unique names per country by checking the size of the Set. You can also find the names (the elements of the Set if required)

Upvotes: 0

iGian
iGian

Reputation: 11193

One liner option:

ary.uniq.group_by { |e| e.split.first }.transform_values(&:count)
#=> {"England"=>2, "USA"=>1}

Upvotes: 5

Tom Lord
Tom Lord

Reputation: 28305

The problem, really, is that you're storing this data as an array of strings. This is a poor choice of data structure, as it makes manipulation much harder.

Suppose, for example, we first convert this data into a Hash, which maps each country to the list of names:

data = ['England John', 'England Ben', 'USA Paul', 'England John']

mapped_names = {}

data.each do |item|
  country, name = item.split
  mapped_names[country] ||= []
  mapped_names[country] << name
end

Now, obtaining the count is quite easy:

mapped_name_counts = unique_names.transform_values { |names| names.uniq.count }

The resulting variables are:

mapped_names # => {"England"=>["John", "Ben", "John"], "USA"=>["Paul"]}
mapped_name_counts # => {"England"=>2, "USA"=>1}

And if using ruby version 2.7 (not yet released!!), that last line of code could even be simplified to:

mapped_name_counts = unique_names.tally(&:uniq)

Upvotes: 3

Related Questions