Reputation: 571
So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):
def letter_frequency(file)
letters = 'a' .. 'z'
File.read(file) .
split(//) .
group_by {|letter| letter.downcase} .
select {|key, val| letters.include? key} .
collect {|key, val| [key, val.length]}
end
letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}
Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z'
that holds all the symbols? Hope that makes sense.
Upvotes: 0
Views: 95
Reputation: 1044
You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.
This is probably a faster implementation that counts all characters in the file:
def char_frequency(file_name)
ret_val = Hash.new(0)
File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
ret_val
end
p char_frequency("1003v-mm") #=> {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...
For reference I used this test file.
Upvotes: 1
Reputation: 2246
It may not use much Ruby magic with Ranges but a simple way is to build a character counter that iterates over each character in a string and counts the totals:
class CharacterCounter
def initialize(text)
@characters = text.split("")
end
def character_frequency
character_counter = {}
@characters.each do |char|
character_counter[char] ||= 0
character_counter[char] += 1
end
character_counter
end
def unique_characters
character_frequency.map {|key, value| key}
end
def frequency_of(character)
character_frequency[character] || 0
end
end
counter = CharacterCounter.new("this is a test")
counter.character_frequency # => {"t"=>3, "h"=>1, "i"=>2, "s"=>3, " "=>3, "a"=>1, "e"=>1}
counter.unique_characters # => ["t", "h", "i", "s", " ", "a", "e"]
counter.frequency_of 't' # => 3
counter.frequency_of 'z' # => 0
Upvotes: 0