Reputation: 48916
I'm trying to write a script that counts the number of words, but, with some exceptions described using some regular expressions.
The script looks as follows:
number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.size {|word| word !~ standalone_number and word !~ standalone_letter and word !~ email_address } }
puts number_of_words
As you can see, I don't want to include standalone numbers, letters, or email addresses in the word count,
When I read a text file containing this information:
1 2 ruby [email protected]
I got a word count of 4
, while I was expecting to get 1
(ruby only included in the count).
What am I missing here?
Thanks.
EDIT
I fixed the "standalone_letter" regular expression as it was written by mistake similar to the "email_address" regular expression.
I have solve the issue using a solution I have added to the answers.
Upvotes: 2
Views: 46
Reputation: 48916
This works!
text = File.open('xyz.txt', 'r')
number_of_words = 0
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /^[a-zA-Z]$/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text.each_line(){ |line| number_of_words = number_of_words + line.split.count {|word| word !~ standalone_number && word !~ standalone_letter && word !~ email_address }}
puts number_of_words
Upvotes: 0
Reputation: 2359
The problem is because you use size
, which count the number of elements in the array, and it does not accept a block. You have to use count
and every thing will go well.
so a match cleaner solution is like this.
standalone_number = /\A[-+]?[0-9]*\.?[0-9]+\Z/
standalone_letter = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
email_address = /\A([\w+\-].?)+@[a-z0-9\-]+(\.[a-z]+)*\.[a-z]+\Z/
text = file.read
num_of_words = text.split.count{ |word| [standalone_number, standalone_letter, email_address].none?{ |regexp| word =~ regexp } }
puts num_of_words
Upvotes: 1
Reputation: 4551
You could also delete the matching words from the array as follows:
text.each_line(){ |line| number_of_words = number_of_words + line.split.delete_if {|word| word ~ standalone_number and word ~ standalone_letter and word ~ email_address }.size }
puts number_of_words
This will remove matching elements and then count the size of the array.
Upvotes: 0
Reputation: 53319
Array#size
doesn't take a block like that. You're looking for Array#count
.
line.split.count { ... }
Also, just a thought, instead of looping through the lines of the text incrementing a counter, it looks like you just check directly on your original text, line breaks and all, and get the same result.
Upvotes: 2