user3131148
user3131148

Reputation: 353

How can I search for a word using Ruby?

I have a name of a show like oferson of interest.

In my code I am trying to split it into single words then capitilize the first letter of each word, then join them back together with a space between each word which then becomes: Oferson Of Interest. I then want to search for the word Of and replace it with a lower case.

The problem I can't seem to figure out is, at the end of the program I get oferson of Interest which isn't what I want. I just wanted the word "of" to be lower case not the first letter of the word "Oferson", simply put I wanted an output of Oferson of Interest not oferson of Interest.

How can I search for the single word 'of' not for every instance of the letters 'o' and 'f' in the sentence?

mine = 'oferson of interest'.split(' ').map {|w| w.capitalize }.join(' ')
 if mine.include? "Of"
   mine.gsub!(/Of/, 'of')
else
  puts 'noting;'
end

puts mine

Upvotes: 0

Views: 59

Answers (2)

the Tin Man
the Tin Man

Reputation: 160551

You're dealing with "stop words": Words you don't want to process for some reason. Build a list of stopwords you want to ignore, and compare each word to them to see whether you want to do further processing to it:

require 'set'

STOPWORDS = %w[a for is of the to].to_set
TEXT = [
  'A stitch in time saves nine',
  'The quick brown fox jumped over the lazy dog',
  'Now is the time for all good men to come to the aid of their country'
]

TEXT.each do |text|
  puts text.split.map{ |w|
    STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
  }.join(' ')
end
# >> a Stitch In Time Saves Nine
# >> the Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country

That's a simple example, but shows the basics. In real life you'll want to handle punctuation, like hyphenated words.

I used a Set, because it's extremely fast as the list of stop words grows; It's akin to a Hash so the check is faster than using include? on an array:

require 'set'
require 'fruity'

LETTER_ARRAY = ('a' .. 'z').to_a
LETTER_SET = LETTER_ARRAY.to_set

compare do

  array {LETTER_ARRAY.include?('0') }
  set { LETTER_SET.include?('0') }
end
# >> Running each test 16384 times. Test will take about 2 seconds.
# >> set is faster than array by 10x ± 0.1

It gets more interesting when you want to protect the first letter of the resulting string, but the simple trick is to force just that letter back to uppercase if it matters:

require 'set'

STOPWORDS = %w[a for is of the to].to_set
TEXT = [
  'A stitch in time saves nine',
  'The quick brown fox jumped over the lazy dog',
  'Now is the time for all good men to come to the aid of their country'
]

TEXT.each do |text|
  str = text.split.map{ |w|
    STOPWORDS.include?(w.downcase) ? w.downcase : w.capitalize
  }.join(' ')
  str[0] = str[0].upcase
  puts str
end
# >> A Stitch In Time Saves Nine
# >> The Quick Brown Fox Jumped Over the Lazy Dog
# >> Now is the Time for All Good Men to Come to the Aid of Their Country

This isn't a good task for a regular expression, unless you're dealing with very consistent text patterns. Since you're working on the names of TV shows, odds are good you're not going to find much consistency and your pattern would grow in complexity quickly.

Upvotes: 0

Andy Henson
Andy Henson

Reputation: 407

The simplest answer is to use word boundaries in your regular expression:

str = "oferson of interest".split.collect(&:capitalize).join(" ")
str.gsub!(/\bOf\b/i, 'of')
# => Oferson of Interest

Upvotes: 1

Related Questions