Anon
Anon

Reputation: 9

Split a string by multiple delimiters

I want to split a string by whitespaces, commas, and dots. Given this input :

"hello this is a hello, allright this is a hello."

I want to output:

hello 3
a 2
is 2
this 2
allright 1

I tried:

puts "Enter string "
text=gets.chomp
frequencies=Hash.new(0)
delimiters = [',', ' ', "."]
words = text.split(Regexp.union(delimiters))
words.each { |word| frequencies[word] +=1}
frequencies=frequencies.sort_by {|a,b| b}
frequencies.reverse!
frequencies.each { |wor,freq| puts "#{wor} #{freq}"}

This outputs:

hello 3
a 2
is 2
this 2
allright 1
 1

I do not want the last line of the output. It considers the space as a word too. This may be because there were consecutive delimiters (,, &, " ").

Upvotes: 0

Views: 679

Answers (1)

SRack
SRack

Reputation: 12203

Use a regex:

str = 'hello this is a hello, allright this is a hello.'
str.split(/[.,\s]+/)
# => ["hello", "this", "is", "a", "hello", "allright", "this", "is", "a", "hello"]

This allows you to split a string by any of the three delimiters you've requested.

The stop and comma are self-explanatory, and the \s refers to whitespace. The + means we match one or more of these, and means we avoid empty strings in the case of 2+ of these characters in sequence.

You might find the explanation provided by Regex101 to be handy, available here: https://regex101.com/r/r4M7KQ/3.


Edit: for bonus points, here's a nice way to get the word counts using each_with_object :)

str.split(/[.,\s]+/).each_with_object(Hash.new(0)) { |word, counter| counter[word] += 1 }
# => {"hello"=>3, "this"=>2, "is"=>2, "a"=>2, "allright"=>1}

Upvotes: 7

Related Questions