Xibition
Xibition

Reputation: 197

count the frequency of a given word in text file in Ruby

I want to be able to count the number of occurrences of a given word (input for example) in a text file. I have this code and it gives me the occurrence of all the words in the file :

word_count = {}
    my_word = id
    File.open("texte.txt", "r") do |f|
    f.each_line do |line|
    words = line.split(' ').each do |word|
      word_count[word] += 1 if word_count.has_key? my_word
      word_count[word] = 1 if not word_count.has_key? my_word
    end
  end
end

puts "\n"+ word_count.to_s

thank you

Upvotes: 1

Views: 4154

Answers (2)

Cary Swoveland
Cary Swoveland

Reputation: 110675

Create a test file

Let's first create a file to work with.

text =<<-BITTER_END
It was the best of times, it was the worst of times, it was the age of wisdom,
it was the age of foolishness, it was the epoch of belief, it was the epoch of
incredulity, it was the season of Light, it was the season of Darkness, it was
the spring of hope, it was the winter of despair, we had everything before us,
we had nothing before us...
BITTER_END

FName = 'texte.txt'
File.write(FName, text)
  #=> 344

Specify the word to be counted

target = 'the'

Create a regular expression

r = /\b#{target}\b/i
  #=> /\bthe\b/i

The word breaks \b are used to ensure that, for example, 'anthem' is not counted as 'the'.

Gulp small files

If, as here, the file is not humongous, you can gulp it:

File.read("texte.txt").scan(r).count
  #=> 10

Read large files line-by-line

If the file is so large that we'd want to read it line-by-line, do the following.

File.foreach(FName).reduce(0) { |cnt, line| cnt + line.scan(r).count }
  #=> 10

or

File.foreach(FName).sum { |line| line.scan(r).count }
  #=> 10

mindful that Enumerable#sum made its debut in Ruby v2.4.

See IO::read and IO::foreach. (IO.methodx...is commonly written File.methodx.... This is permitted because File is a subclass of IO; i.e., File < IO #=> true.)

Use gsub to avoid the creation of a temporary array

The first method (gulping the file) creates a temporary array:

["the", "the", "the", "the", "the", "the", "the", "the", "the", "the"]

to which count (aka size) is applied. One way to avoid the creation of this array is to use String#gsub rather than String#scan, as the former, when used without a block, returns an enumerator:

File.read("texte.txt").gsub(r).count
  #=> 10

This could be used for each line of the file as well.

This is an unconventional, but sometimes helpful, use of gsub.

Upvotes: 6

Gerry
Gerry

Reputation: 10497

If you only want to get the count of a specific word, there is no need to use a Hash, for example:

word_count = 0
my_word = "input"

File.open("texte.txt", "r") do |f|
  f.each_line do |line|
    line.split(' ').each do |word|
      word_count += 1 if word == my_word
    end
  end
end

puts "\n" + word_count.to_s

word_count will contain the total number of occurrences of my_word.


If, on the other hand, you want to keep count of all words and then just print the count of a specific word, then you can use a Hash, but try something like this:

word_count = Hash.new(0)
my_word = "input"

File.open("texte.txt", "r") do |f|
  f.each_line do |line|
    line.split(' ').each do |word|
      word_count[word] += 1
    end
  end
end

puts "\n" + word_count[my_word].to_s

word_count will contain all words found with the total occurrences (words being the keys of the Hash and occurrences their values); to print the occurrences of my_word, you just need to get the value of the hash using my_word as key.

Upvotes: 0

Related Questions