Reputation: 49
So I'm trying to define "#titleize," a method that will capitalize the first letters of all words in a string, except fluff words such as 'the,' 'and,' and 'if.'
My code so far:
def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in}
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
words.join(' ')
end
My trouble is with the if/else section - I'm getting "syntax error, unexpected $end, expecting keyword_end" when I run the method on a string.
I think the code would work if I used the shorthand version of if/else, which normally goes into code blocks inside of {curly brackets}. I know that the syntax for this shorthand looks like something along the lines of
string.scan(/\w+/) { |word| !stopwords.include?(word) words << word.capitalize : words
<< word }
...with
words << word.capitalize
occurring if !stopwords.include?(word) returns true, and
words << word
occurring if !stopwords.include?(word) returns false. But this isn't working either!
It may also look something like this (which is a bit of a different/more efficient approach - no separate array instantiated):
string.scan(/\w+/) do |word|
!stopwords.include?(word) word.capitalize : word
end.join(' ')
(From Calling methods within methods to Titleize in Ruby) ...but I'm receiving "syntax error" messages when I run this code as well.
So! Does anyone know the syntax I'm referring to? Could you help me remember it? Or, can you point out another reason that this bit of code isn't working?
Upvotes: 1
Views: 1237
Reputation: 3388
Not only are you missing an end
(to close the method), your words.join(' ')
is inside the scan
block, which means words is joining every time you iterate through
scan`.
I think you want this:
def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in}
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
end
words.join(' ')
end
while your code could be cleaned up, the basic flow is sound at this point.
Upvotes: 0
Reputation: 160631
Active Support has the titleize
method, which is useful as a starting point, because it'll capitalize the words in a string, however it's not entirely intelligent; It lays waste to stopwords. A touch of post-processing to restore them fixes this up nicely though.
Here's how I'd do it:
require 'active_support/core_ext/string/inflections'
STOPWORDS = Hash[
%w{the a by on for of are with just but and to the my had some in}.map{ |w|
[w.capitalize, w]
}
]
def my_titlize(str)
str.titleize.gsub(
/(?!^)\b(?:#{ STOPWORDS.keys.join('|') })\b/,
STOPWORDS
)
end
# => /(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/
my_titlize('Jackdaws love my giant sphinx of quartz.')
# => "Jackdaws Love my Giant Sphinx of Quartz."
my_titlize('the rain in spain stays mainly in the plain.')
# => "The Rain in Spain Stays Mainly in the Plain."
my_titlize('Negative lookahead is indispensable')
# => "Negative Lookahead Is Indispensable"
The reason I do this is it's really easy to build a YAML file, or a database table, to provide the list of stopwords. From that array of words it's easy to build a hash, and a regex, which is fed to gsub
, which then uses the regex engine to touch-up the stopwords.
The hash created is:
{
"The"=>"the",
"A"=>"a",
"By"=>"by",
"On"=>"on",
"For"=>"for",
"Of"=>"of",
"Are"=>"are",
"With"=>"with",
"Just"=>"just",
"But"=>"but",
"And"=>"and",
"To"=>"to",
"My"=>"my",
"Had"=>"had",
"Some"=>"some",
"In"=>"in"
}
The regex created is:
/(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/
When gsub
gets a hit on a word in the regex pattern, it does a lookup in the hash and substitutes the value back into the string.
The code could use downcase
or other calculated ways of reversing the upper-cased words, but that adds overhead. gsub
, and the regex engine, are very fast. Part of that is because the hash and the regex avoid looping over the stopword list, so that list can be huge without it slowing the code much. Of course, the engine has changed over the different versions of Ruby, so older versions don't do as well, so run benchmarks for Ruby < 2.0.
Upvotes: 1
Reputation: 12588
It is hard to hunt bugs in suboptimal code. Do it in a canonical way, and make possible errors easy to spot.
class String
SQUELCH_WORDS = %w{the a by on for of are with just but and to the my had some in}
def titleize
gsub /\w+/ do |s|
SQUELCH_WORDS.include?( s ) ? s : s.capitalize
end
end
end
"20,000 miles under the sea".titleize #=> "20,000 Miles Under the Sea"
Upvotes: 0
Reputation: 8003
I think you are missin an end
:
string.scan(/\w+/) do |word|
if !stopwords.include?(word)
words << word.capitalize
else
words << word
end
end #<<<<add this
For the shorthand version do this:
string.scan(/\w+/).map{|w| stopwords.include?(w) ? w : w.capitalize}.join(' ')
Upvotes: 3