Simplicity
Simplicity

Reputation: 48916

Not getting the expected output from the script

I'm trying to write a script that counts the number of words, but, with some exceptions described using some regular expressions.

The script looks as follows:

number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.size {|word| word !~ standalone_number and word !~ standalone_letter and word !~ email_address  } }
puts number_of_words

As you can see, I don't want to include standalone numbers, letters, or email addresses in the word count,

When I read a text file containing this information:

1 2 ruby [email protected]

I got a word count of 4, while I was expecting to get 1 (ruby only included in the count).

What am I missing here?

Thanks.

EDIT

I fixed the "standalone_letter" regular expression as it was written by mistake similar to the "email_address" regular expression.

I have solve the issue using a solution I have added to the answers.

Upvotes: 2

Views: 46

Answers (4)

Simplicity
Simplicity

Reputation: 48916

This works!

text = File.open('xyz.txt', 'r')
number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /^[a-zA-Z]$/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.count {|word|  word !~ standalone_number && word !~ standalone_letter && word !~  email_address }}
puts number_of_words

Upvotes: 0

Nafaa Boutefer
Nafaa Boutefer

Reputation: 2359

The problem is because you use size, which count the number of elements in the array, and it does not accept a block. You have to use count and every thing will go well.

so a match cleaner solution is like this.

standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/

text = file.read
num_of_words = text.split.count{ |word| [standalone_number, standalone_letter, email_address].none?{ |regexp| word =~ regexp } }

puts num_of_words

Upvotes: 1

ShellFish
ShellFish

Reputation: 4551

You could also delete the matching words from the array as follows:

text.each_line(){ |line| number_of_words = number_of_words + line.split.delete_if {|word| word ~ standalone_number and word ~ standalone_letter and word ~ email_address }.size }
puts number_of_words

This will remove matching elements and then count the size of the array.

Upvotes: 0

Ben Lee
Ben Lee

Reputation: 53319

Array#size doesn't take a block like that. You're looking for Array#count.

line.split.count { ... } 

Also, just a thought, instead of looping through the lines of the text incrementing a counter, it looks like you just check directly on your original text, line breaks and all, and get the same result.

Upvotes: 2

Related Questions