user2839246
user2839246

Reputation: 49

If/Else as a {Code Block}?

So I'm trying to define "#titleize," a method that will capitalize the first letters of all words in a string, except fluff words such as 'the,' 'and,' and 'if.'

My code so far:

def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in} 

string.scan(/\w+/) do |word|
    if !stopwords.include?(word) 
        words << word.capitalize
    else 
        words << word 
    end

words.join(' ')
end

My trouble is with the if/else section - I'm getting "syntax error, unexpected $end, expecting keyword_end" when I run the method on a string.

I think the code would work if I used the shorthand version of if/else, which normally goes into code blocks inside of {curly brackets}. I know that the syntax for this shorthand looks like something along the lines of

string.scan(/\w+/) { |word| !stopwords.include?(word) words << word.capitalize : words       
    <<  word }

...with

words << word.capitalize 

occurring if !stopwords.include?(word) returns true, and

words << word

occurring if !stopwords.include?(word) returns false. But this isn't working either!

It may also look something like this (which is a bit of a different/more efficient approach - no separate array instantiated):

string.scan(/\w+/) do |word|
    !stopwords.include?(word) word.capitalize : word
end.join(' ')

(From Calling methods within methods to Titleize in Ruby) ...but I'm receiving "syntax error" messages when I run this code as well.

So! Does anyone know the syntax I'm referring to? Could you help me remember it? Or, can you point out another reason that this bit of code isn't working?

Upvotes: 1

Views: 1237

Answers (4)

dancow
dancow

Reputation: 3388

Not only are you missing an end (to close the method), your words.join(' ') is inside the scan block, which means words is joining every time you iterate throughscan`.

I think you want this:

def titleize(string)
  words = []
  stopwords = %w{the a by on for of are with just but and to the my had some in} 

  string.scan(/\w+/) do |word|
      if !stopwords.include?(word) 
          words << word.capitalize
      else 
          words << word 
      end
  end

  words.join(' ')
end

while your code could be cleaned up, the basic flow is sound at this point.

Upvotes: 0

the Tin Man
the Tin Man

Reputation: 160631

Active Support has the titleize method, which is useful as a starting point, because it'll capitalize the words in a string, however it's not entirely intelligent; It lays waste to stopwords. A touch of post-processing to restore them fixes this up nicely though.

Here's how I'd do it:

require 'active_support/core_ext/string/inflections'

STOPWORDS = Hash[
  %w{the a by on for of are with just but and to the my had some in}.map{ |w| 
    [w.capitalize, w]
  }
]


def my_titlize(str)
  str.titleize.gsub(
    /(?!^)\b(?:#{ STOPWORDS.keys.join('|') })\b/,
    STOPWORDS
  )
end
# => /(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/

my_titlize('Jackdaws love my giant sphinx of quartz.')
# => "Jackdaws Love my Giant Sphinx of Quartz."

my_titlize('the rain in spain stays mainly in the plain.')
# => "The Rain in Spain Stays Mainly in the Plain."

my_titlize('Negative lookahead is indispensable')
# => "Negative Lookahead Is Indispensable"

The reason I do this is it's really easy to build a YAML file, or a database table, to provide the list of stopwords. From that array of words it's easy to build a hash, and a regex, which is fed to gsub, which then uses the regex engine to touch-up the stopwords.

The hash created is:

{
  "The"=>"the",
  "A"=>"a",
  "By"=>"by",
  "On"=>"on",
  "For"=>"for",
  "Of"=>"of",
  "Are"=>"are",
  "With"=>"with",
  "Just"=>"just",
  "But"=>"but",
  "And"=>"and",
  "To"=>"to",
  "My"=>"my",
  "Had"=>"had",
  "Some"=>"some",
  "In"=>"in"
}

The regex created is:

/(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/

When gsub gets a hit on a word in the regex pattern, it does a lookup in the hash and substitutes the value back into the string.

The code could use downcase or other calculated ways of reversing the upper-cased words, but that adds overhead. gsub, and the regex engine, are very fast. Part of that is because the hash and the regex avoid looping over the stopword list, so that list can be huge without it slowing the code much. Of course, the engine has changed over the different versions of Ruby, so older versions don't do as well, so run benchmarks for Ruby < 2.0.

Upvotes: 1

Boris Stitnicky
Boris Stitnicky

Reputation: 12588

It is hard to hunt bugs in suboptimal code. Do it in a canonical way, and make possible errors easy to spot.

class String
  SQUELCH_WORDS = %w{the a by on for of are with just but and to the my had some in}

  def titleize
    gsub /\w+/ do |s|
      SQUELCH_WORDS.include?( s ) ? s : s.capitalize
    end
  end
end

"20,000 miles under the sea".titleize #=> "20,000 Miles Under the Sea"

Upvotes: 0

tihom
tihom

Reputation: 8003

I think you are missin an end:

string.scan(/\w+/) do |word|
    if !stopwords.include?(word) 
        words << word.capitalize
    else 
        words << word 
    end
end #<<<<add this

For the shorthand version do this:

string.scan(/\w+/).map{|w| stopwords.include?(w) ? w : w.capitalize}.join(' ')

Upvotes: 3

Related Questions