Reputation: 1876
I'm trying to parse words out of a string and put them into an array. I've tried the following thing:
@string1 = "oriented design, decomposition, encapsulation, and testing. Uses "
puts @string1.scan(/\s([^\,\.\s]*)/)
It seems to do the trick, but it's a bit shaky (I should include more special characters for example). Is there a better way to do so in ruby?
Optional: I have a cs course description. I intend to extract all the words out of it and place them in a string array, remove the most common word in the English language from the array produced, and then use the rest of the words as tags that users can use to search for cs courses.
Upvotes: 38
Views: 68805
Reputation: 86
I would write something like this:
@string
.split(/,+|\s+/) # any ',' or any whitespace characters(space, tab, newline)
.reject(&:empty?)
.map { |w| w.gsub(/\W+$|^\W+^*/, '') } # \W+$ => any trailing punctuation; ^\W+^* => any leading punctuation
irb(main):047:0> @string1 = "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
=> "oriented design, 'with', !!qwe, and testing. can't rubyisgood#)(*#%)(*, and,rails,is,good"
irb(main):048:0> @string1.split(/,+|\s+/).reject(&:empty?).map { |w| w.gsub(/\W+$|^\W+^*/, '')}
=> ["oriented", "design", "with", "qwe", "and", "testing", "can't", "rubyisgood", "and", "rails", "is", "good"]
Upvotes: 1
Reputation: 229
For me the best to spliting sentences is:
line.split(/[^[[:word:]]]+/)
Even with multilingual words and punctuation marks work perfectly:
line = 'English words, Polski Żurek!!! crème fraîche...'
line.split(/[^[[:word:]]]+/)
=> ["English", "words", "Polski", "Żurek", "crème", "fraîche"]
Upvotes: 22
Reputation: 6837
For Rails you can use something like this:
@string1.split(/\s/).delete_if(&:blank?)
Upvotes: 1
Reputation: 1086
Well, you could split the string on spaces if that's your delimiter of interest
@string1.split(' ')
Or split on word boundaries
\W # Any non-word character
\b # Any word boundary character
Or on non-words
\s # Any whitespace character
Hint: try testing each of these on http://rubular.com
And note that ruby 1.9 has some differences from 1.8
Upvotes: 14
Reputation: 21572
The split command.
words = @string1.split(/\W+/)
will split the string into an array based on a regular expression. \W means any "non-word" character and the "+" means to combine multiple delimiters.
Upvotes: 69