Reputation: 101
I have an array, containing n
amount of elements. Each element contains two words.
This makes the array look like this: ['England John', 'England Ben', 'USA Paul', 'England John']
I want to find the number of unique names for each country. For example, England
would have 2 unique names as John
exists two times.
So far I have split the array into two arrays, one containing the countries such as ['England', 'Usa', ...]
and the other containing names ['John', 'Paul', ...]
, however I'm unsure of where to go from here
Upvotes: 0
Views: 567
Reputation: 110755
arr = ['England John', 'England Ben', 'USA Paul', 'England John']
arr.uniq.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 }
#=> {"England"=>2, "USA"=>1}
This requires two passes through the array (arr.uniq
being the first). To make only a single pass one could do the following.
require 'set'
uniques = Set.new
arr.each_with_object(Hash.new(0)) { |s,h| h[s[/\S+/]] += 1 if uniques.add?(s) }
#=> {"England"=>2, "USA"=>1}
See the form of Hash::new that takes an argument (called the default value), and also Set#add?.
It's not clear to me which of the two calculations would generally be faster.
Upvotes: 1
Reputation: 8656
A bit more verbose than the other solutions but does not use transform_values
from ActiveSupport.
require "set"
data = ["England John", "England Ben", "USA Paul", "England John", "Switzerland Pascal"]
names_per_country = data.each_with_object({}) do |country_and_name, accu|
country, name = country_and_name.split(" ")
country_data = accu[country] ||= Set.new
country_data << name
end
names_per_country.each do |country, names|
puts "#{country} has #{names.size} unique name(s)"
end
# => England has 2 unique names
# => USA has 1 unique names
# => Switzerland has 1 unique names
This solution first transforms the array to a Hash
structure, where the key is the country name and the value is a Set
.
I've chosen Set
because it does take care of the unique part of your question automatically (a Set
can not contain duplicates).
After that you can find the number of unique names per country by checking the size
of the Set
.
You can also find the names (the elements of the Set
if required)
Upvotes: 0
Reputation: 11193
One liner option:
ary.uniq.group_by { |e| e.split.first }.transform_values(&:count)
#=> {"England"=>2, "USA"=>1}
Upvotes: 5
Reputation: 28305
The problem, really, is that you're storing this data as an array of strings. This is a poor choice of data structure, as it makes manipulation much harder.
Suppose, for example, we first convert this data into a Hash
, which maps each country to the list of names:
data = ['England John', 'England Ben', 'USA Paul', 'England John']
mapped_names = {}
data.each do |item|
country, name = item.split
mapped_names[country] ||= []
mapped_names[country] << name
end
Now, obtaining the count is quite easy:
mapped_name_counts = unique_names.transform_values { |names| names.uniq.count }
The resulting variables are:
mapped_names # => {"England"=>["John", "Ben", "John"], "USA"=>["Paul"]}
mapped_name_counts # => {"England"=>2, "USA"=>1}
And if using ruby version 2.7 (not yet released!!), that last line of code could even be simplified to:
mapped_name_counts = unique_names.tally(&:uniq)
Upvotes: 3