Tom Lehman
Tom Lehman

Reputation: 89373

Best way to count words in a string in Ruby?

Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?

Upvotes: 33

Views: 44828

Answers (8)

For example count words of "War and peace" in file:

acc = 0
File.readlines('war_and_peace.txt').each { |line| acc += line.split.size }

acc
 => 238610

Upvotes: 0

KitsuneYMG
KitsuneYMG

Reputation: 12901

string.split.size

Edited to explain multiple spaces

From the Ruby String Documentation page

split(pattern=$;, [limit]) → anArray

Divides str into substrings based on a delimiter, returning an array of these substrings.

If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.

If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.

If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ' ' were specified.

If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.

" now's  the time".split        #=> ["now's", "the", "time"]

While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.

Upvotes: 61

Akhil Gautam
Akhil Gautam

Reputation: 169

The best way to do is to use split method. split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings. split takes two parameters, namely; pattern and limit. pattern is the delimiter over which the string is to be split into an array. limit specifies the number of elements in the resulting array. For more details, refer to Ruby Documentation: Ruby String documentation

str = "This is a string"
str.split(' ').size
#output: 4

The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.

Upvotes: 1

abonn
abonn

Reputation: 61

This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.

puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s


puts "your sentence is " + target + " words long"

Upvotes: 1

Mohamad
Mohamad

Reputation: 35359

I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.

The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.

counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]

# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]

# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]

The gem provides a bunch more useful methods.

Upvotes: 13

Lri
Lri

Reputation: 27633

This splits words only on ASCII whitespace chars:

p "  some word\nother\tword|word".strip.split(/\s+/).size #=> 4

Upvotes: 0

Hillel
Hillel

Reputation: 49

The above solution is wrong, consider the following:

"one-way  street"

You will get

["one-way","", "street"]

Use

'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size

Upvotes: 0

user128666
user128666

Reputation:

If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):


>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]

However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".

Upvotes: 2

Related Questions