Reputation: 33755
I have the following strings:
Chicago CPA
New York CPA
West Virginia Accountant
How do I always just chop off the last word (and the preceding whitespace) in the string, preserving all other words before the last word?
So the correct versions of the above data set would be:
Chicago
New York
West Virginia
Also, is it possible to test matching groups on Rubular or is there another online regex editor/tester that I can use to test regexes with matching groups?
Edit 1
Many of the answers are great in theory. I read them, I understand them and I test them on a vanilla string and they seem to work. But when I try it on my data, it doesn't. I was stumped for a while, and I just realized why.
This is the HTML I am working on:
<h1 class="search-term">
Chicago <strong>Cpa</strong>
</h1>
So this the text, I am attempting to do this string manipulation on:
Chicago <strong>Cpa</strong>
So here is what happens when I try each of the answers below.
@Darshan's:
[56] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[57] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[58] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.match(/(.*) \w+\z/)[1]
NoMethodError: undefined method `[]' for nil:NilClass
from (pry):57:in `<class:PageCrawler>'
[59] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text[/.*(?=\s\w+\z)/]
=> nil
@Lucas's own:
[60] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[61] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[62] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.split()[0...-1].join(' ')
=> ""
@Eric's own:
[65] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[66] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[67] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.split().reverse.drop(1).reverse.join(" ")
=> ""
@Casimir's own (this one is the best so far, actually):
[68] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[69] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[70] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.sub(/\W+\w+\W*$/, '')
=> "Chicago"
@Santosh's own:
[71] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text
=> "Chicago Cpa"
[72] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text.class
=> String
[73] pry(YPCrawler::PageCrawler)> @document.css('header h1.search-term').first.text[/(.*)\s/,1]
=> nil
My apologies for not doing this earlier, but I didn't anticipate this being an issue.
Upvotes: 0
Views: 174
Reputation: 7959
You can use the regex /^(.*)\s+\w+\s*$/
to capture everything but the last word:
Example:
str = <<~EOF
Chicago CPA
New York CPA
West Virginia Accountant
EOF
str.each_line do |line|
puts line.match(/^(.*)\s+\w+\s*$/).captures.first
end
Output:
Chicago
New York
West Virginia
Upvotes: 0
Reputation: 34031
I'll preface by saying I'm not particularly good with regular expressions, and I'm not sure off the top of my head (nor do I feel inclined to benchmark or think hard about) whether this would tend to be more or less efficient than @LucasP's non-regex approach. But this is the obvious approach that comes to mind for me:
s.match(/(.*) \w+\z/)[1]
That matches at the end of the string one or more word characters preceded by a space, and puts everything before that into a group that you then grab.
data = ['Chicago CPA',
'New York CPA',
'West Virginia Accountant']
data.map{|s| s.match(/(.*) \w+\z/)[1]}
# => ["Chicago", "New York", "West Virginia"]
Edit: A variant on this approach, suggested by @CarySwoveland, is to use a lookahead expression to ignore the part we want to discard, rather than my initial approach of putting the part we want into a capturing group that we then access. Here's a version of that approach:
data.map{|s| s[/.*(?=\s\w+\z)/]}
# => ["Chicago", "New York", "West Virginia"]
Edit 2: With your added information, it's now clear that the issue you were facing is that you have non-breaking spaces, which even with \s
aren't matched (\s
only matches ASCII whitespace, equivalent to [ \t\r\n\f]
). So using either the POSIX bracket expression [[:space:]]
or explicitly matching \u00A0
for the non-breaking space character works, assuming all are non-breaking spaces. I prefer the former, since you might have other whitespace there sometimes:
data.map{|s| s[/.*(?=[[:space:]]\w+\z)/]}
Upvotes: 4
Reputation: 2248
Try Following.
str = ['Chicago CPA', 'New York CPA', 'West Virginia Accountant']
str.map{|s| s[0...s.rindex(' ')]}
output: ["Chicago", "New York", "West Virginia"]
Using Regexp.
str2 = "West Virginia Accountant"
p str2[/(.*)\s/,1]
output: "West Virginia"
Upvotes: 0
Reputation: 89557
Assuming you have more than one word, you can use a replacement:
'West Virginia Accountant'.sub(/\W+\w+\W*$/, '')
Upvotes: 1
Reputation: 303
One way of achieving this is the following:
myString.split()[0...-1].join(' ')
Where myString
is each string you want to perform this operation on.
First you split from string to a list containing each word.
Then select the sublist that contains all elements except the last one.
Finally you go back from list to a string.
Upvotes: 2