Reputation: 353
I have a name of a show like oferson of interest
.
In my code I am trying to split it into single words then capitilize the first letter of each word, then join them back together with a space between each word which then becomes: Oferson Of Interest
. I then want to search for the word Of
and replace it with a lower case.
The problem I can't seem to figure out is, at the end of the program I get oferson of Interest
which isn't what I want. I just wanted the word "of" to be lower case not the first letter of the word "Oferson", simply put I wanted an output of Oferson of Interest
not oferson of Interest
.
How can I search for the single word 'of' not for every instance of the letters 'o' and 'f' in the sentence?
mine = 'oferson of interest'.split(' ').map {|w| w.capitalize }.join(' ')
if mine.include? "Of"
mine.gsub!(/Of/, 'of')
else
puts 'noting;'
end
puts mine
Upvotes: 0
Views: 59
Reputation: 160551
You're dealing with "stop words": Words you don't want to process for some reason. Build a list of stopwords you want to ignore, and compare each word to them to see whether you want to do further processing to it:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
puts text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
end
# >> a Stitch In Time Saves Nine
# >> the Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
That's a simple example, but shows the basics. In real life you'll want to handle punctuation, like hyphenated words.
I used a Set, because it's extremely fast as the list of stop words grows; It's akin to a Hash so the check is faster than using include?
on an array:
require 'set'
require 'fruity'
LETTER_ARRAY = ('a' .. 'z').to_a
LETTER_SET = LETTER_ARRAY.to_set
compare do
array {LETTER_ARRAY.include?('0') }
set { LETTER_SET.include?('0') }
end
# >> Running each test 16384 times. Test will take about 2 seconds.
# >> set is faster than array by 10x ± 0.1
It gets more interesting when you want to protect the first letter of the resulting string, but the simple trick is to force just that letter back to uppercase if it matters:
require 'set'
STOPWORDS = %w[a for is of the to].to_set
TEXT = [
'A stitch in time saves nine',
'The quick brown fox jumped over the lazy dog',
'Now is the time for all good men to come to the aid of their country'
]
TEXT.each do |text|
str = text.split.map{ |w|
STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
}.join(' ')
str[0] = str[0].upcase
puts str
end
# >> A Stitch In Time Saves Nine
# >> The Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country
This isn't a good task for a regular expression, unless you're dealing with very consistent text patterns. Since you're working on the names of TV shows, odds are good you're not going to find much consistency and your pattern would grow in complexity quickly.
Upvotes: 0
Reputation: 407
The simplest answer is to use word boundaries in your regular expression:
str = "oferson of interest".split.collect(&:capitalize).join(" ")
str.gsub!(/\bOf\b/i, 'of')
# => Oferson of Interest
Upvotes: 1