Luminusss
Luminusss

Reputation: 571

Counting frequency of symbols

So I have the following code which counts the frequency of each letter in a string (or in this specific instance from a file):

def letter_frequency(file)
  letters = 'a' .. 'z'
  File.read(file) .
  split(//) .
  group_by {|letter| letter.downcase} .
  select   {|key, val| letters.include? key} .
  collect  {|key, val| [key, val.length]}
end

letter_frequency(ARGV[0]).sort_by {|key, val| -val}.each {|pair| p pair}

Which works great, but I would like to see if there is someway to do something in ruby that is similar to this but to catch all the different possible symbols? ie spaces, commas, periods, and everything in between. I guess to put it more simply, is there something similar to 'a' .. 'z' that holds all the symbols? Hope that makes sense.

Upvotes: 0

Views: 95

Answers (2)

Gabriel de Oliveira
Gabriel de Oliveira

Reputation: 1044

You won't need a range when you're trying to count every possible character, because every possible character is a domain. You should only create a range when you specifically need to use a subset of said domain.

This is probably a faster implementation that counts all characters in the file:

def char_frequency(file_name)
  ret_val = Hash.new(0)
  File.open(file_name) {|file| file.each_char {|char| ret_val[char] += 1 } }
  ret_val
end

p char_frequency("1003v-mm")  #=>  {"\r"=>56, "\n"=>56, " "=>2516, "\xC9"=>2, ...

For reference I used this test file.

Upvotes: 1

Pete
Pete

Reputation: 2246

It may not use much Ruby magic with Ranges but a simple way is to build a character counter that iterates over each character in a string and counts the totals:

class CharacterCounter
  def initialize(text)
    @characters = text.split("")
  end

  def character_frequency
    character_counter = {}      
    @characters.each do |char|
      character_counter[char] ||= 0
      character_counter[char] += 1
    end

    character_counter
  end

  def unique_characters
    character_frequency.map {|key, value| key}
  end

  def frequency_of(character)
    character_frequency[character] || 0
  end
end

counter = CharacterCounter.new("this is a test")
counter.character_frequency # => {"t"=>3, "h"=>1, "i"=>2, "s"=>3, " "=>3, "a"=>1, "e"=>1}
counter.unique_characters # => ["t", "h", "i", "s", " ", "a", "e"]

counter.frequency_of 't' # => 3
counter.frequency_of 'z' # => 0

Upvotes: 0

Related Questions